ANALYSIS OF DEVELOPMENT OF LOCAL SELF-GOVERNMENT UNITS IN VOJVODINA

© 2020 EA. All rights reserved.


Introduction
The effects of the globalization are manifested not only at the national level, but also at the level of mesoregions or micro-regions, which increase the importance of territorial units. This stems from the fact that local and regional development responsibilities and competencies are delegated to the regional institutions (Liptáková, Rigová, 2020). Assessing regional and, as well as, local development is a methodologically challenging and politically relevant issue. The development of a region depends on the development level of the local governments in that region. Through local economic development, the economic capacity of the local area is developed to create a basis for economic progress and quality of life for the whole society. Local economic development integrates regional and development policy, as well as all other policies, with the aim of faster development of local communities (Glavaš-Trbić, et al. 2008). Local economic development is composite and complex area that, in addition to economic development policy including agriculture, also incorporates other divisional, structural and social policies, local infrastructural development policy, as an indispensable ambience for local economic development, as well as all sorts of civic initiatives contributing to local communities' improvement (Kačar, et al. 2016).
The aim of this research is to determine the influence of various factors on the development of the observed units in local self-government (municipalities) in the Vojvodina region by applying discriminant analysis and logistic regression, as statistical methods suitable for the categorical data analysis. Specifically, the factors that are expected to have an impact on the development of a particular municipality are: population density -population per km² (PD), number of employed inhabitants per 1,000 inhabitants (EM), number of highly educated inhabitants per 1,000 inhabitants (ED), natural increase (NI) and investment in new capacities (IN).

Materials and methods
The classification of local self-government units into developed and underdeveloped ones was carried out based on the "Decree on the establishment of a single list of development of the region and local self-governemnt units for 2014". Regions and local self-government units, which are classified into the first, second, third and fourth groups and devastated areas based on data from the authority responsible for statistics and finance. ("Sl. glasnik RS", No. 104/2014). The classification of regions and local self-government units into specific groups was done based on the gross domestic product per capita value in the region or local self-government unit, relative to the national average. For the purposes of this research, local self-government units are classified as developed (first and second group), development rate is over 80% of the national average and underdeveloped (third and fourth group), development rate is below 80% of the national average.
For statistical analysis of selected factors of development of local self-government units (municipalities), two statistical methods were applied: discriminant analysis and binary logistic regression. Discriminant analysis (DA) and logistic regression (LR) are widely used multivariate statistical methods for analyzing data with categorical outcome variables (Pohar, et al. 2004). The difference between these two methods is that the discriminant analysis implies certain assumptions that must be respected for its application, above all the normality of the data, while the logistic regression model is not based on any assumptions. Kolmogorov-Smirnov and Shapiro-Wilks normality tests, Leven variance homogeneity test and Brown-Forsythe arithmetic mean group test were used to test the assumptions for discriminant analysis. The homogeneity of the group covariance matrices was checked using Box's M statistics.
Some authors define that the variate for a discriminant analysis, also known as the discriminant function, is derived from an equation much like that seen in multiple regression. It takes the following form (Hair et al., 2006): = discriminant Z score of discriminant function j for object k a= intercept = discriminant weight for independet variable i = independet variable i for object k Wilk's -test was used to interpret the obtained discriminant function, which is of the differences among group means of independent variables, was used to ascertain the level of significance for each group predictor. To estimate the degree of deviation influence, the standardized canonical discriminant function was applied (Heil, Schmidhalter, 2014).

Logistic regression
Logistic regression model represents a statistical method for predicting the outcome of categorical dependent variable based on one or more independent variables that are called predictors. When observed outcome for dependent variable has two possible options, model is called binary logistic regression model (Kovljenić, Savić 2017 The following form of regression is used for this purpose: π(x) = e α +β1X1 +β2X2 +⋯+βkXk 1 + e α +β1X1 +β2X2 +⋯+βkXk [2] Where π(х) represents the expected value of Y for a given value of X, while the parameters α i ß1,2,..k correspond to the parameters α i ß1,2,...k from the linear regression model and represent the average initial level of the dependent variable and coefficients regressions showing the average change in logit per unit of change independently variable. The logistic regression function thus obtained is nonlinear and can be linearized by logit transformation.
If the logistic regression function is linearized, we get the following form: The resulting equality is called logit and it is linear with the parameters ßi, i = 1 ... k. It can be observed that π belongs to the interval [0,1], while the logit value ranges from (-∞, + ∞), so it can be said that the logit function is the best choice for displaying this function (Chatterjee, Ali, 2006). The Wald statistic test is usually used in which β is estimated using the maximum likelihood estimator (Basu et al., 2017).
The overall assessment of the model to fit the data can be examined using the Hosmer-Lemeshow test, as well as the classification matrix provided by the SPSS software package used in the data processing. One of the most commonly used indicators of model quality is Cox and Snell and Nagelkereke pseudo R². Although values of pseudo R² indices typically range from zero to unity, values for some indices can exceed 1.0 (Walker, Smith, 2016).
The choice of variables is conditioned by many factors, the most important of which are the availability of data and the requirements set by the applied statistical methods. The survey is based on data about the development of AP Vojvodina local self-government units from the "Municipalities and regions" (Opštine i regioni) for the period 2013-2018. The SPSS software package was used for statistical data processing.

Results and Discussion
From the In terms of employment, the average at the regional level is 232 employees per 1,000 inhabitants, which shows a low employment rate. The city of Novi Sad has the highest employment rate with 400 employees per 1,000 inhabitants, while the municipality of Opovo has the lowest employment rate with 126 employees per 1,000 inhabitants.
The average number of university graduates per 1,000 inhabitants in the territory of Vojvodina is 91, the lowest number of higher educations is in the municipality of Žabalj, while the highest number is those with higher education in Novi Sad.
A negative natural increase rate is present in almost all municipalities in the territory of Vojvodina, only the city of Novi Sad stands out with a positive natural growth rate of 0.8 ppm. Investments in new capacities are presented in absolute amount. The average investment in the observed period amounts to RSD 2,737,636.91. High velues of coefficients of variation indicate that there are significant differences between the observed municipalities. The highest variability is observed with the variable investment, which is expected given the variation range. Firstly, the assumptions for applying discriminatory analysis were tested. The first assumption refers to the collinearity of the variables, and for the purpose of testing the collinearity of the variables, a correlation matrix was used within the groups to show the correlation between the variables (Table 2). Table 2 shows that the highest values of correlation coefficients are visible in the correlation between PD and IN (r = 0.665), followed by PD and ED (r = 0.593) and PD and NI (r = 0.546).   Table 3. shows statistically significant group mean.
The application of discriminant analysis assumes the existence of group covariance matrices homogeneity, which is usually checked in Box's M statistics in multivariate analysis (Table 4). This test statistical significance may be due to the deviation of the data from the normal distribution, not to the inequality of the metrics covariance.
The results presented in Table 4 show that complete agreement with the multidimensional normal distribution was not reached. .000

Source: Authors calculation
The last assumption of discriminant analysis concerns the normality and linearity of the original data. Apart from the fact that all variables, except for NI and EM, show deviations from the normal distribution, the original data is burdened with many non-standard observations. Since the original set of variables did not achieve complete agreement with the normal distribution, logarithmic transformation of the data was applied.
The transformation achieved not only better agreement of the transformed data distribution with normal distribution, but also a reduction in the number of non-standard observations, which gives the analysis better opportunities to more accurately extract discriminatory functions.

Results of discriminant analysis
From the results shown in Table 5 a single canonical discriminant function was isolated. The eigenvalue indicates the relative discriminant power of the discriminant function, the higher eigenvalue means that the more variance in the dependent variable is explained by the given function. The canonical correlation is 0.777; it represents the quadratic root of the relation between the intergroup and the total sum of squares.
The significance of the isolated discriminant function was tested via Wilks' lambda =0.396 and, for χ2 = 36,559 and df = 5, confirmed at p = 0.000, which, together with the value of the canonical correlation coefficient, show that discriminative function is significant (Table 5).
In the Structure matrix table (Table 6) variables are ordered by absolute values of correlations with the discriminant function. The largest contribution to the discriminatory function structure were made by variables: ED (0.792), followed by EM (0.776). The smallest contribution to the discriminant function structure had the variable NI (0.363). Although significantly different, the values of all coefficients are statistically significant. The discriminant function standardized canonical coefficients ( Table 7) represent measure of the selected independent variables relative influence, the higher value of the coefficients corresponds to the greater discriminative ability and means that the groups differ in that variable. The independent variable with the most discriminatory power is EM, followed by ED, while the other three independent variables were less successful as predictors. Canonical discriminant function coefficients represent the coefficients of final canonical discriminant function (Table 7).
Based on the calculated coefficients, the discriminant function takes the following form: After discriminant function was calculated, the intersection point was determined based on the centroids in each group. The discriminant function intersection point is weighted average between the centroids in each of the distributions. The optimum cross-sectional limit recorded is 1.003. This value classifies municipalities according to their discriminatory result, i.e. municipalities where function value is below 1.003 belong to the group of underdeveloped municipalities, while municipalities with discriminatory grades above this value belong to the group of developed municipalities ( Table 8).

Results of logistic regression
The stepwise method was used to select variables in the regression analysis. The selection of variables is condusted in four steps, from which only the results of the fourth step will be described.
The performance of the model was tested using the Omnibus coefficient test, called also as "goodness of fit" because it shows how well the model predicts results. The Omnibus test (Table 9) found that there was a statistically significant difference between the models containing the selected independently variable and the one containing no independent variable (Sig. <0.05). The same conclusion can be drawn from the data presented in the following table.  The third and fourth columns (Table 11) show the values of pseudo coefficients, the values of these two indicators are 0.594 and 0.801, which indicate that the model with the given set of variables is well fitted to the data.  In this analysis, the main factors that influence whether a municipality will be developed are EM (Sig = 0.0004) and NI (Sig = 0.031), while other factors did not significantly contribute to the model predictive capabilities.
Based on the predictor variables calculated coefficients the logistic regression model equation is calculated, and it takes the following form: An area under the rock curve (AUC) was calculated, for the purpose of additional analysis on the degree of the prediction agreement with the data. The ROC curve to which the above analyzes refer is shown in Figure 1.   Source: Authors calculation Based on the values in the classification table, it is possible to determine the sensitivity of the model (Table 15). It can be noted (Table 15) that logistic regression and discriminant analysis models have successfully classified approximately the same percentage of cases. However, based on the AUC values, it can be concluded that the discriminant analysis model slightly exceeds the logistic regression model.

Conclusion
This paper compares two methods: discriminant analysis and logistic regression to assess the impact of five variables on the likelihood of a local government unit being classified as developed or underdeveloped. Variables that were assumed to have an impact on the development of municipalities are: population density -number of inhabitants per km², number of employees per 1,000 inhabitants, number of highly educated inhabitants per 1,000 inhabitants, natural increase and investments in new capacities. Out of 45 municipalities in AP Vojvodina, 26 municipalities belong to the group of developed, while 19 municipalities have the status of underdeveloped municipalities. After testing the assumptions for the application of discriminant analysis, the discriminant function was calculated. The discriminatory analysis results showed that the most important factors influencing the municipality classification are number of employees per 1,000 inhabitants and the number of higher education inhabitants per 1,000 inhabitants. The significance of the discriminant function was confirmed by the Wilks' lambda test and the canonical correlation coefficient value. The logistic regression results showed that number of employees per 1,000 inhabitants and natural increase are the most important predictors. The model evaluation was performed by measuring the overall classification accuracy, sensitivity and specificity as well as by examining the area under the ROC curve (AUC). The results show that both models have good classification power. The discriminant analysis model successfully classified 90.9% of all cases, while the logistic regression model 88.6% of all cases. When considering the percentages of sensitivity,