The Implementation of the Neural Networks to the problem of Economic Classification of Countries 4

This paper shows practical implementation of the multilayer feedforward neural network, trained by supervised backpropagation algorithm, to the problem of automatic classification of countries into beforehand predefined categories of economic development, contained in the United Nations report entitled World Economic Situation and Prospects 2012. The goal of the paper is to automate the process of classification of countries, to define a set of key measurable economic development indicators, as well as to emphasize significance of neural networks for solving classification problems. The research includes classification of 168 countries in 4 groups of economic development, based on 7 selected measurable indicators. The data from the official reports of the international economic institutions served for training of the intelligent decision-making system based on neural network, and as a measure of quality of training, confusion matrix was used, showing the precision of the intelligent system by determining the percentage of overlap with empirically obtained data. Precision of automatic classification speaks of neural networks as powerful apparatus for solving classification problems, but also of justification of choice of classification parameters and their importance. The importance of selected indicators is reflected in the fact 1 University of Kragujevac, Faculty of Economics, sobradovic@kg.ac.rs 2 University of Kragujevac, Faculty of Hotel Management and Tourism in Vrnjacka Banja 3 University of Kragujevac, Faculty of Hotel Management and Tourism in Vrnjacka Banja 4 This paper is a part of research project No. 179015, financed by the Ministry of Science and Technological Development of the Republic of Serbia. Obradović S. et al.: The Implementation of the Neural Networks to the problem of... 26 Industrija, Vol.42, No.4, 2014 that knowledge of their value is sufficient condition for automatic classification with reliability level of 80%.


Introduction
There are various international economic and financial institutions functioning in the global economy, whose goal is accelerating national and general global economic development: United Nations Department of Economic and Social Affairs (UN/DESA), International Monetary Fund (IMF), World Bank, United Nations Conference on Trade and Development (UNCTAD)), Organization for Economic Cooperation and Development (OECD), etc.These institutions monitor economic indicators' trends and they are a source of large volumes of data, whose semantic connection initiates research interest in discovering hidden patterns and new useful information arising from the correlation of available data.Large database search technique and finding connections and relationships between seemingly unrelated information is called Data Mining.
Data Mining is an interdisciplinary area, which includes: database, expert systems, computer science, statistics, mathematics, logics and a number of other fields.Development of information technologies and their penetration in almost all aspects of the economic and social systems functioning enabled the application of Data Mining in the areas of macroeconomics, marketing, business, industry, etc.
One of the most important tasks of Data Mining is Classification.The Classification includes data analysis, discovering hidden relations and determining elements based on which data are classified in one of several classes.This paper will use Data Mining technique for solving the classification problem, that is automatic classification of countries in the beforehand predefined categories of economic development, starting from carefully selected classification parameters, that is development indicators.As will be overviewed, previous studies did not address the specific issues.The contribution to the existing classification of countries is in the automatization of this process, but also in the definition of a set of key measurable economic development indicators.
For the purpose of solving this classification Data Mining task, artificial neural networks will be used as analytic techniques formed on the basis of assumed learning process in the human brain.Neural networks are able to find connections between complex phenomena that elude human intellectual apparatus and establish functional mapping, and have the possibility to tolerate defects.Neural network is a system composed of many simple processors, so it can function even if part of the network is damaged, i.e. it has the generalization ability, or possibility to generate output data even in the case of incomplete set of input data.Listed characteristics of neural networks are the main reason for their application in solving this and similar classification problems.
The remainder of this paper is organized as follows.In the second part is presented literature review about the implementation of the neural networks in solving different problems, especially in economics.Then, in the third part is described investigation's problem, while the fourth section briefly explains the data and used methodology.The fifth section reports our findings, and finally, the last part of paper presents conclusions, including discussion of results.
Similarly, the results of Naeini et al. (2010), speak of superiority of MLP with feedforward in predicting the intensity of stock value changes compared to Elman recurrent network and linear regression methods.However, the authors admit that MLP is less successful in predicting the direction of change compared to the mentioned alternative methods.The reasons for this experimentally confirmed conclusion derive from different architectures of the mentioned types of neural networks and selected training algorithms.Nittis et al. (1998), have proved that the ANN model based on multi-layer perceptron is a good instrument for consumer loans classification and identification of those loans which are potentially risky, that is those which may not be repaid on schedule.Authors conclude that MLP can serve as first level filter for accepting or rejecting loan applications, while the intervention of bank employees as second level filter would be needed only in borderline cases.
Similar to aforementioned studies, the research shown in this paper supports the conclusion of the above mentioned authors that ANNs are powerful apparatus for solving various economic problems.As already pointed out, concrete problem-solving task set as the classification of countries according to the economic development level by application of neural networks has not been the subject of previous known research.The realization of the aspiration to establish a functional dependence between seven selected economic development indicators and beforehand predefined categories of economic development would be the contribution to the process of categorizing countries in terms of process automatization and definition of a set of key development indicators.

Problem description
In the United Nations report named World Economic Situation and Prospects 2012 (WESP 2012), countries are classified into four groups according to their achieved level of economic development: developed economies, economies in transition, developing economies and least developed economies.The group of developed economies, according to WESP 2012, consists of 35 countries.The economies in transition, which change their central-planning economic system into the market-oriented capitalistic one, and authoritarian regimes into democratic societies, according to WESP 2012, amount to 18 countries.The largest group of countries is developing economies, characterized by low or intermediate indicators of socio-economic development.Finally, the group of the least developed economies, countries with the lowest indicators of socio-economic development, according to 2011 data, consists of 48 member countries.
It is important to emphasize that there is not one universal agreed upon criterion for this division of countries but several of them, as there is no unified view on what countries belong to which group, because even the members of the same group are often mutually different according to certain characteristics.Important classification parameters, that is indicators of achieved level of development of an economy, are:

below income poverty line PPP 1.25$ a day • urban population
The goal is to create a decision-making system based on neural network which automates the classification of countries into one of four predefined groups of achieved economic development, based on the values of seven selected economic development indicators, which represent the input parameters of the system.Efforts towards the fulfillment of the set goal is the solution of a sort of test which can be presented hypothetically as follows: H 0 : Through the construction and application of the intelligent system structured on the basis of MLP, it is possible to make successful automatic classification of countries into categories of achieved economic development level.
H 1 : Sufficient condition for successful classification of a country into one of four categories of economic development is knowledge of the value of seven economic development indicators: HDI, GDP, GDP per capita, unemployment rate, FDI, population below income poverty line PPP 1.25$ a day and urban population.
H 2: Missing data in the knowledge base, which is the foundation of the neural network, are not an obstacle that questions the reliability of the classification apparatus.

Materials and Methods
Process of our research effort first involves available data selection supported by domain knowledge, than technical realization starting from the way data are represented to the neural network realization as the efficient Data Mining apparatus.Since neural networks can mine valuable information from a mass of history information, find patterns in the data and infer rules from them, they can perform better than conventional statistical approaches and are an excellent Data Mining tool (Yashpal & Alok, 2009).
One of the most commonly used neural networks in performing classification tasks is MLP which uses the supervised training (Hart, 1992).MLP is feedforward artificial neural networks (FFANN) arranged in layers.It contains three types of layers: the input layer, one or more hidden layers, and the output layer (Ecer, 2013).Each unit accepts input only from the directly previous layer (Fig. 1).

Figure 1. MLP with one hidden layer
Source: Authors' calculation Each processing node of ANN represents the body of the artificial neuron.When some values are brought to the entrance of the artificial neuron and multiplied with weight coefficients, we receive processing units input data.The sum of input values of neurons multiplied with adequate weight coefficients is run through the activation function and that value represents the artificial neuron output (Fig. 2).

Figure 2. Mathematical model of artificial neuron
Source: Authors' calculation So, it is as follows: where   is the output of j neuron of the previous layer,   an output produced by the activation function  of the supervised neuron, and  , weight coefficient of the relation connecting supervised neuron with j neuron of the previous layer (Russell & Norvig, 2003).
Activation functions of the artificial neurons in hidden layers are necessary in order for the network to be able to learn nonlinear functions.That is why it is important for activation function to be nonlinear.For implementation of backpropagation algorithm usually sigmoid functions are used, and in this paper hyperbolic tangent function is used (tansig) (Fig. 3):  (1985).Backpropagation algorithm learns the schemes by comparing neural network output with the desired output and calculates mistakes for every node in the network.Neural network then adjusts weights of relations to the error values assigned to each of the nodes.Calculation starts with the exit layer, through the hidden layers, towards the entry layer.After the modification of parameters, new inputs are entered into network.Training stops only when the network is able to provide outputs with satisfactory accuracy (Bishop, 2000).
As for the classification problem, which is the subject of this research, there are four nodes at the exit layer, which correspond to the number of classification groups.Activation of the node that corresponds with a certain classification group represents the classification result given in the form of vector, where each coordinate corresponds to the one node of the exit layer (the one corresponding with the active node has the value of 1, while the rest have value 0).When there are more nodes at the exit layer, the output is presented by a vector, and therefore the error is also vector value representing the difference of the expected (y) and obtained resulting vector (h w ): If   is  coordinate of the vector  modified error is defined as: Rule of correction of the weight coefficients  , between output and hidden layer becomes: Values ∆  are divided in accordance with the strength of the relationship between the hidden node and output node and backpropagate in order to obtain values ∆  for hidden layer: Then the weighted update rule for weights between the input and hidden layer is similar to the rule for the output layer: A common measure of error is the sum of squares of errors per output layer nodes:

𝑖
Dependence of the measure of error  of weight coefficient is presented through the partial derivatives.In the case of MLP with one hidden layer, dependence of the measure of error of weight coefficients between output and hidden layer is calculated as follows: Analogously, the dependence of the measure of error of weight coefficients between output and hidden layer results in (Russell & Norvig, 2003): Characteristics of FFANN and MLP with applied backpropagation algorithm as its main representative, imposed a software implementation of such neural network as the method for solving the given classification problem.

Results and discussion
The result of the paper is the algorithm for classification of countries on the basis of input parameters, which in its foundation has the realization of feedforward neural network with one hidden layer.Efficient usage of the application in which algorithm is implemented assumes the existence of data on classification parameters' values, as well as data on achieved level of economic development of a large number of countries.Such data are stored in precisely defined way.Data on classification parameters, that is indicators of economic development are stored in the matrix whose each column corresponds to one of the countries, and each row to one parameter, which results in matrix consisting of the number of parameters x number of countries included in the research (in this case, 7x168).Elements of matrix are values of parameters for each country, wherein it is allowed that some positions in the matrix remain empty as the result of missing data.Similarly, data on the achieved level of the economic development of countries are presented by the matrix whose every column corresponds to some of the countries, and each row to some classification group, which results in matrix consisting of number of classification groups x number of countries included in the research (in this case, 4x168).Elements of this matrix are binary digits, 0 or 1, wherein 1 is in the row which corresponds to the group of economic development which includes the country corresponding to the observed column.These two matrixes are consistent in the sense that the order of columns corresponds to a unique pre-arranged order of countries, and they are kept in the form of .csvfiles (CSV File).
Software is written in the Matlab programming language, using Matlab Neural Networks Toolbox.There are three parts at the graphical user interface (GUI): In the first part, the user forwards the data to the program based on which neural network training is performed.Namely, by selecting the matrix containing the values of classification parameters as the starting set of data and matrix containing data on economic development of countries as the target ones, conditions for Data Mining are created.The user has the possibility to design hidden layer by specifying the number of neurons in the hidden layer and the possibility to design number of epochs.The parameter values are given in the second panel, and, by activating the neural network, the prediction results are shown in the third panel, numerically (1 is the value of the field which corresponds to the category of economic development to which a country belongs based on the neural network prediction) and graphically (Fig. 4).The research leading to the introduction of the additional indicators has resulted in the conclusion that the reliability of forecast is slightly changed.It has been shown that the introduction of the Stock market capitalization (% of GDP), as the eighth parameter (Table 1), contributes to the accuracy of the classification, but insignificantly, while the introduction of some of the indicators such as Central government debt (% of GDP), Gross savings (% of GDP), Merchandise trade (% of GDP) has a slightly negative impact on the reliability of forecast, which is also the case with the introduction of different combinations of the four mentioned, additionally examined indicators.In contrast to the negligible impact of these indicators, the omission of one of the seven indicators listed in the problem description has a significant, negative impact on the classification reliability.

Conclusions
The data from the reports of the international economic institutions represent very useful knowledge bases.In addition to directly available data, large amount of knowledge is hidden in the relations between data.The results of this research emphasize seven selected indicators as equally significant factors of the economic development of a country.An examination of such attitude and possible ranking of indicators according to the degree of contribution to the economic development level could be proposed as the topics of some future research.

Figure 5 .
Figure 5. Confusion matrix in the period from 2010 to 2012.Although the research work has been applied to a total of 168 countries, the next table(Table 1), for clarity purposes, provides parameter values only for a sub-set of each of the four main economic development groups.Successful Data Mining analysis of the available data includes the problem of automatization of mapping generation process, which, starting from the values of the observed indicators, unambiguously classifies a country into one of four groups of economic development (developed economies, economies in transition, developing economies, least developed economies).Contribution to the process of categorization of countries which unambiguously classifies a country into one of four groups of economic development is in its automatization, but also in defining a set of key development indicators.
= − � ∆   ,  ′ �  �  = −    ∆  .(12) In this research, the ANN has been used for the analysis of correlation values of the economic development indicators of 168 countries with the adequate achieved economic development levels.The achieved level of economic development is known for each of 168 countries, while the values of some indicators for certain countries belong to the missing data group.As information sources, in addition to WESP 2012 following reports were used: World Development Indicators 2013 (WDI 2013), Human Development Report 2011 (HDR 2011) and Human Development Report 2013 (HDR 2013).WESP 2012 includes the list of countries according to their achieved level of economic development, while in other reports there is information on values of the observed economic development indicators (HDI, GDP, GDP per capita, unemployment rate, FDI, population below income poverty line, urban population)

Table 1 -
Sub-set of investigated database Source: Authors' calculation Supervised training represents calling to backpropagation algorithm implementation which creates feedforward neural network with one hidden layer capable of predicting classification group based on new values of parameters.The quality of network training can be monitored directly with the help of Matlab Neural Networks Toolbox, which opens in this process, but also in the GUI where final training performances are shown in the terms of deviations of training, test, and validation performances.Thus, ANN is generated, accepting seven input parameters as algorithm (HDI, GDP, GDP per capita, unemployment rate, FDI, population below income poverty line, urban population) and, based on their values, defines one of the four categories of economic development to which the country belongs (developed economies, economies in transition, developing economies, least developed economies).
Artificial neural networks as Data Mining techniques enable recognizing hidden relations and making new conclusions.Created conclusions can be proved or disproved by comparison with existing independently created formal reports.Data from the formal reports are used for training of the ANNs, and confusion matrix was used as the measure of the training quality.Reliability at 80% level represents the confirmation of the hypothesis H 0 that the construction and application of the intelligent system, structured on the basis of MLP, can successfully automatically classify countries into the categories of achieved economic development level.High reliability of the obtained apparatus indicates that HDI, GDP, GDP per capita, unemployment rate, FDI, population below income poverty line PPP 1.25$ a day and urban population can be classified among the most important economic development indicators.Multi-layer ANN proved to be a successful instrument for classification of countries into pre-defined categories of economic development, included in the United Nation report named WESP 2012.The lack of values of certain indicators for certain countries included in the research has been compensated by the ability of intelligent reasoning of the neural networks, which confirmed the hypothesis H 2. Software apparatus based on sufficiently well trained so structured ANN enables automatic classification of certain countries based on the measurable indicators of economic development, opposite to formal reports where classification is done without unique criteria, but by using experiential knowledge, whose domain has no clearly defined boundaries.Of course, even this automatized process relies on the experiential knowledge which has been used for the selection of indicators of economic development.The very fact that it is possible to automatize classification based on the selected indicators is the proof that such a choice is justified, and therefore the hypothesis H 1 which emphasizes the knowledge of the values of those indicators as sufficient condition for determining the category of the achieved level of economic development of a certain country.Confirmation of the hypothesis reached the goal of the research.Actuality of data used for the implementation of the product of this research enables its further development.Natural course of development of the described research would be experimental usage of the obtained program.Comparing the categorization received through the program usage based on the indicator value for the future periods to the future formal reports which have the same categorization, would be a kind of dynamic evolution of hypothesis tested in this research.Analysis of differences, which are inevitable, would potentially enable improvement of the program reliability, but it would also open debate about the formal categorization.