Enrollment Management: Development of Prediction Model Based on Logistic Regression

This paper presents the development of a model for forecasting and decision-making by applying logistic regression. The prediction of professional choices for graduates has been verified on the sample of 159 graduates. Predictor variables are grouped in nine input variables and the data collected from the Unique Education Information System database. Professional choices of graduates after finishing vocational school are grouped into three output variables and the data collected in the survey. The obtained results show that the application of logistic regression can predict the number and structure of students enrolled in higher education.


INTRODUCTION
Higher education has an important role in economic and social development, and therefore all the changes and reform processes in that domain should lead to the improvement of socioeconomic climate in one country. Higher education reform underlines the need for changing and specifying the role of universities and faculties in the enrollment management process. Previous reform activities of the world's leading universities and faculties (Aberdeen, 2003), as well as of the Serbian higher education system, indicate the importance of the reform of the management process as a key factor in improving both the quality of education and the scientific research in universities (Gerasimović, 2012). In that context, the existence of the Unique Education Information System (EIS) under the jurisdiction of the Ministry of Education and Science of the Republic of Serbia, is of great importance in improving the process of enrollment management in higher education institutions in Serbia.
The constant development, implementation and improvement of the EIS provides a more effective resource planning, resource management and better monitoring of the activities in the educational system on all its management levels. The EIS enables the data exchange from schools to municipalities, from schools to Regional School Ministry Administration, as well as from schools administration to the central level of the Ministry. Furthermore, the maximum possible data transfer automatization has been provided in the opposite direction, from the central level of the Ministry to the schools.
The implementation of the EIS increases efficiency and the effectiveness of the education system by managing the following activities: planning schools' network, planning necessary teaching and nonteaching staff, financial planning, monitoring and controlling students' achievements and schools' performances.
The EIS is designed through the following program modules: • Students' database • School employees' database • Inventory and schools premises database • Financial module Analytical software for simulation of the implementation of different education funding formulas (version available in the Ministry).
In the focus of the research presented in this paper is the "students' database", which is as an integral program module of the EIS. During the analysis of the EIS database, it has been noticed that it provides an important quantity of information regarding statistic students data, and furthermore, information about the students performance in different courses during their education (quarters' reports, school year end reports), as well as the parents information (education level, forms of employment, etc.). In order to select the required information from the mentioned EIS database, it is essential to determine the key factors of the graduate students' professional choice. Based on those factors a research on the sample of graduate student population has been conducted in order to gather baseline data needed to develop the predictive model. Implementation of this model of forecasting and decision making will practically mean to use the input data from the EIS database and generate the output data that can help higher education institutions predict the number of enrolled students and make enrollment decisions.
This paper proposes a model for forecasting and decision making based on logistic regression and it represents an integral part of the support system in decision making (Figure 1).
The support system enables the accomplishment of the objectives defined by the strategic orientation of higher education institutions. The software support provides the efficiency of decision making, ease of use and adaptability.

LITERATURE REVIEW
Forecasting is predicting the outcome of an event in the future. It is used in predicting future event consequences, future event occurrence time, but also time series like future values of periodic data series in certain point of time.
Forecasting can be used on time series for predicting the value of the periodic data series over a period of time. There are various estimation and forecasting methods. The statistical forecasting methods are widely used especially the method of regression analysis.
The characteristic of the method of regression analysis is that the dependent variable, namely the forecasting value, is expressed as a mathematical function of one or more variables, predictive values, known in the time of forecasting (Hillier & Lieberman, 2001). There are a great number of articles, monographs and books in the field of operational research that deal with predicting and decision making in various domains. Among them are those that refer to higher education and predicting the professional choice of graduate students, Miljkovic et al., 2011), selection and retention of students (Hossler, 1984), the role of financial support for students (Spaulding & Olswang, 2005), enrollment management strategies (Antons & Maltz, 2006;Bilous, 2009), prediction student retention (Duke, 2006), the impact of the tuition fees on student enrollment decision (Edward, 1990), etc. Different methodologies are applied on those researches. However, a review and systematization of the existing papers, research operations models and methods in the field of prediction and decision making in the process of enrollment management, indicates that developing different forecasting, optimization and decision making models, among other is prepared by application of regression analysis, i.e. logistic regression (Chang, 2006;Edward, 1990).

METHODOLOGY
The aim of this paper is to develop a model for predicting the number and structure of students enrolled in higher education institutions of technical sciences by use of regression analysis, more precisely logistic regression, as a part of methodology which should be used in process of enrollment management in higher education institutions in Serbia.

Logistic Regression
Logistic regression is a statistical method used to test models for prediction of categorical outcomes (dependent variable size (y)), with two or more categories, while the predictor i.e. the independent variables (x) are categorical (discrete), continuous, or a combination of both in the same model (Moore & McCabe, 2004).
One of the mathematical functions that best describe the connection between the independent variables and the probability of finding the variable in a particular category is represented by the following formula The odds ratio of the possibility or impossibility of occurrence of a certain event usually is presented as (Moore & McCabe, 2004): The logistic regression is used for statistical modeling of the categorical dependent variable. Binary data are the most common form of categorical data.
The dependent variable takes values 0 and 1, depending on whether the observed event occurred or not. If the logistic model is consisted of the combination of continuous and categorical predictor values, which is the case analyzed in this paper, the functional dependence between the occurrence probability of the observed event and the mentioned independent variables is given in the form of: The coefficients b j are derived from the categorical independent variables which have a total of n-k+1, the factors c j,i have values 0 or 1, and their number (m) depends on the number of categories for each variable.

The sample description
In order to develop the prediction model, a research has been conducted in two vocational schools (VET schools) in Belgrade on the sample of 159 students. The research was conducted targeting the higher education institutions of engineering, more specifically, the Faculty of Mechanical Engineering in Belgrade (FME). The data collected from the EIS database of the 159 students is used as starting values (nine input variables). The data collected in the survey on the same sample of students (experiment) is used as control values (three output variables) in the developing phase of the predictive models. The encoding mode of input and output variables and their descriptive statistics are presented in Table 1.
Using the input data from the EIS database and the results obtained by the experiment, a model has been developed based on the logistic regression, which can provide support in the decision making in the process of managing and forecasting the enrollment in higher education institutions of technical sciences.
Before performing the regression analysis the validity check of the assumptions of logistic regression was performed. Correlation analysis proved the non existence of the assumption of multi-collinearity. The results given in table 2 show that the values of the Tolerance (TOL) coefficient for each independent variable are larger than the critical value of 0.20, therefore, the assumption of no multi-collinearity is not undermined. This conclusion is supported by the VIF coefficient value that ranges from 1,094 to 3,321, which is below the critical value of 5.

RESULTS AND DISCUSSION
The tested model of professional choices prediction was carried out on the sample of 159 graduates. The factors influencing the enrollment on FME are grouped into 9 predictors. Three of them are continuous and six are categorical independent variables. The categorical independent variables are the gender, the employment status of the mother, the employment status of the father, the schooling financial support, the level of education of the mother and the level of education of the father which can be found in EIS. The encoding of these variables is shown in Table 3.
Gender is a dichotomous variable. The female gender is coded as 1, while the male is 0. Other categorical variables have three or five categories (Table 3), and one of them is always a reference while the rest are valuated based on it. In that way the probability of the observed event will equal one for each categorical size. For example, the categorical variable of the employment status of the father is coded as following: unemployed (1) employed (2), while the reference group is coded as -other.
The continuous independent variables (the success in second grade (x 1 ), the success in the third grade (x 2 ), and success in fourth grade (x 3 )) were measured so that higher numbers indicate better results.    The dependent variable -further education at FME is dichotomous and can only have two values: Yes or No, which are coded as following: 1-Yes, 0-No.
The statistical analysis is performed using the SPSS 18.0 software.

Model testing
The goodness of fit of the model has been performed by using the Hosmer-Lemeshow test for logistic regression.
In case of this model results of Hosmer-Lemeshow test are: χ 2 = 7,990 with statistical significance of 0,434, which means that the given model is fitting.
In further analysis, it is determined how accurately the model predicts the category of intends to enroll on FME (Table 4).
Valid data for further analysis is accuracy of predicting the enrollment on FME, i.e. the number of students who will opt to enroll. The obtained results lead to conclusion that the examined regression model correctly classifies 63,6% of the cases of enrollment on the observed Faculty. That means the developed regression model correctly predicts 136 of 159 cases in sample, i.e. 28 of 44 students which decided to enroll FME.
The contribution or the importance of each predictor variable in the model is shown in Table 5. It was established that only one independent variable gave a statistically significant contribution to the predictive capabilities of the model (Sig<0,05). It is the predictor: Success in the fourth year whose odds ratio Exp(B),  (4), excellent (5)) increases the probability of enrollment in FME about 5 times. According to data shown in Table 5, it is possible to determine an analytical expression for calculating the probability that the observed case belongs in the certain category.

CONCLUSIONS
The aim of this study was to develop a predictive and decision-making model using logistic regression, which is confirmed on the example of the enrollment management in higher education institutions. For reasons of gathering the relevant input data (professional orientation of the graduates), the survey was conducted in two VET schools in Belgrade, on a sample of 159 graduates. The survey was conducted targeting the higher education institutions of engineering, more specifically, the Faculty of Mechanical Engineering. The data collected in the survey (experiment) was used in the development and verification of the predictive model, as baseline (nine input variables) and as control values (three output variables). The processing and the systematization of data were performed using the regression analysis -the logistic regression.
The model testing, i.e. determining how well the logistic model predicts the results was conducted using the Pearson's χ 2 test. The analysis showed that the entire model with all predictors is statistically significant, χ 2 (20, N=159)=81,772, p<0,001, which means that the model distinguishes respondents who intend to enroll in the Faculty of Mechanical Engineering in Belgrade from those who do not.
The contribution of the model is confirmed by classification rate. The results show that the tested model correctly classifies 63.6% of the cases of enrollment on Faculty of Mechanical Engineering with experimentally obtained results (the survey).
The model proposed in this paper is just one of the models that can be found among other models with different performances and prediction techniques used to solve the problem of enrollment management in faculties/universities of different scientific fields (technical, social, etc.), legal status (state, private), size, etc.
It can be concluded that the proposed model can provide decision support in the process of enrollment management in higher education institutions. Also, by experimental verification of the developed model for prediction of enrollment, it can be concluded that Unique Educational Information System (EIS) should be used as a support in the process of enrollment management in higher education institutions.