ESTIMATING THE MOVEMENT OF BELEX15 INDEX VALUES USING THE ARIMA MODEL

In this paper the author tried to find a suitable model for forecasting the movement of the Belex15 index on the Belgrade Stock Exchange, using one of the most used methodologies of the last decades the ARIMA method. Through method application in four steps, the author has come to the conclusion that ARIMA (0,1,11) is the optimal model for forecasting, using the given stock index data series. After evaluating the statistical appropriateness of the model and its application to short-term forecasts, the results were compared with the results of the previous work of the author in which he made the forecast using the linear and nonlinear methods (linear regression and neural networks) to the same data. The selected ARIMA model showed good statistical suitability and the possibility of a satisfactory short-term forecast of the Belgrade Stock Exchange index, although it showed a higher RMSE (root mean square error) compared to the nonlinear model applied to the same data.


Introduction
We can say that the stock market is one of the most dynamic systems today, given the current economic conditions (Banerjee, 2014), because various types of financial and economic factors such as: quarterly earnings reports, market news, political events, international influences (Nazário, et al., 2017), economic outlook, inflation, deflation, etc., have a direct impact on stock prices moving up and down with a large amount of fluctuation (Ratnayaka et al., 2015). Due to the fact that successful prediction of the future market value of stocks can bring great profit to investors, the concept of predicting stock market returns has been quite popular for a long time (Banerjee, 2014).
Stock prices are one of the most important factors in a country's economic development that measure the strength or weakness of its economy (Yousif & Elfaki, 2017), and therefore stock market valuation models play an important role in investor behavior (Jiang & Zhang, 2019). Financial data forecasting helps an investor to invest safely in the market (Babu & Reddy, 2014), so many academics and professionals are concerned with this topic. However, while investing in the stock market can bring great wealth, it is very difficult to find law in the trend of the stock market, and the stock market alone still represents a lot of risk (Jiang & Zhang, 2019).
In this paper the author tried to find a suitable model for forecasting the movement of the Belex15 index on the Belgrade Stock Exchange, using one of the most commonly used methodologies of the last decades -the ARIMA method. After the introductory part describing the market situation, Chapter 1 shows the importance of technical analysis in the stock price forecast, after which Chapter 2 gives an overview of the literature on the use of this model. Subsequently, Chapter 3 explains the ARIMA model, through its use for estimating time series movements. Chapter 4 describes the methodology used in this paper to estimate the movements of the Belex15 index value from January 2014 to January 2018, in search of an ARIMA model that would help accurately predict the movement of the index. www.japmnt.com The results were compared with the author's previous estimate of the movement of Belex15 index values using linear regression and neural network (Petrovic, 2019).

Technical analysis
In order to evaluate stock price movements, two types of analysis are used: technical and fundamental analysis (Scott, Carr & Cremonie, 2016). Technical analysis uses only historical stock price movement of data in its forecast (Nazário et al., 2017), such as past stock price, trading volume, volatility, etc., and it analyzes patterns and trends (Merh, Saxena & Pardasani, 2010), while fundamental analysis takes into account financial statements and other economic factors. Technical analysis determines the most probable future share price by analyzing the statistics generated by market activity, presented in charts of historical movements in stock prices (Scott, Carr & Cremonie, 2016). Although this analysis is a passionate obsession of professionals in the field of stock market pricing (Nazário et al., 2017), some academic opinion opposes its use, considering it as an irrational technique according to the efficient market hypothesis (Merh, Saxena & Pardasani, 2010). It also resents being subjective, as every technical analyst does a subjective visual interpretation of patterns on charts. On the other hand, the value of technical analysis as one of the financial tools has found strong support in the literature and works of many practitioners (Scott, Carr & Cremonie, 2016).
The essence of the technical analysis approach is described in the three premises, which say that market action is expressed through price, that stocks move in trends and that history repeats itself (Merh, Saxena & Pardasani, 2010). Technicians believe that all the factors that could affect the stock price, political, psychological or otherwise, are actually reflected through the stock price in that market, and its study is sufficient to forecast the future. The purpose of graphically representing stocks in the market is to identify trends in the early stages of their development and to evaluate the direction of price movements. To understand the future, the key is to study the past, and since these patterns have worked well in the past, it is assumed that they will have the same effect in the future as history repeats itself (https://cmtassociation.org/kb/technicalanalysis-three-premises/). Recent studies combine traditional trading rules for technical analysis with other statistical models and intelligent system techniques, such as: SVM algorithms, neural networks, fuzzy systems, etc., to create a new trading system that could more accurately predict the future direction of stock prices by using stock price data. Certain authors emphasize that stock market volatility and size, different tax systems, level of people's education, and level of political instability can affect operational dynamics and should be taken into account in the analysis (Nazário et al., 2017).

Literature review
The prediction models used in the literature can be classified into three categories: statistical models (ARIMA is one of the most commonly used linear statistical models), AI models ie. artificial intelligence models (such as the SVM -Support Vector Machine, ANN -Artificial Neural Network and others, which handle complex nonlinear functions) and hybrid models as combinations of statistical and AI models (Xiong & Lu, 2017 (2011) applied the ARIMA methodology with intervention to estimate price movements on the Chinese Stock Exchange and, in addition to proposing an optimal ARIMA model, found that using an intervention analysis is very useful in explaining the dynamics of the impact of severe disruptions in the economy and changing the time series of stock price indices. Banerjee (2014) used the ARIMA methodology to predict prices in the Indian stock market based on SENSEX price movements over a period of 6 years, and proposed the appropriate ARIMA model, noting that it assumed the linearity of the data set used, (which perhaps was not the case) as well as that the models prediction of price movements could be influenced by certain government and policy decisions.
Merh, Saxena & Pardasani (2010) combined a linear ARIMA model with a nonlinear BPNN to create a hybrid model that would make more accurate predictions for the BSE 30 (SENSEKS), BSE IT, BSE Oil & Gas, BSE 100 and S&P CNKS Nifty indices. They concluded that the ANN and ARIMA techniques combined offer a competitive advantage over each individual model, although the hybrid models gave different results in the estimation of motion. Ralević et al. (2014), on the example of the Belex15 index of the Belgrade Stock Exchange, also developed a hybrid model composed of ARIMA and fuzzy ANN. Research results have shown that ARIMA and fuzzy ANN provide superior returns on investment. Mondal, Shit & Goswami (2014) created a hybrid ARIMA-ANN model to estimate the movement of ASPI and SL20 indices on CSE, which proved to be more accurate for highly volatile market fluctuations than single traditional models. Ratnayaka et al., (2015) also obtained similar results when analyzing the movements of the same indices. Khandelwal, Adhikari & Verma (2015) advanced the hybrid ARIMA-ANN model with DWT decomposition that improved the accuracy of forecasting on the example of four real-world time series. Abounoori & Tazehabadi (2009) compared the errors of individual models: ARDL, ARIMA, and ANN, with a hybrid model that combines them, and concluded that the hybrid model is a more accurate predictor, noting that using macroeconomic variables would further improve the results. Xiong & Lu (2017) combined the ARIMA model with BPNN to estimate price movements of 4 individual stocks in the Chinese stock market. They showed that although the model is more accurate than the individual models analyzed, in some cases it does not produce better results. Pai & Lin (2005) combined ARIMA methodology with SVM technique to estimate price movements of 10 different stocks from the Tokyo Stock Exchange and obtained promising results in terms of error reduction with the hybrid model, but showed that a simple combination of the two best individual models will not necessarily produce the best results, because it is necessary to optimize the parameters.

ARIMA model for stock time series
Time series refer to time-indexed data series graphically represented as a series of consecutive points with equal spacing between them (Jiang & Zhang, 2019). Time series forecasting is an important and very popular research area in the domain of science and engineering (Khandelwal, Adhikari & Verma, 2015), which is widely used (Xiong & Lu, 2017) and it is important for identifying significant features for future evaluations (Ratnayaka et al., 2015).
The primary goal of time series analysis is to develop a mathematical model that can predict future data movements based on available values (Khandelwal, Adhikari & Verma, 2015).

Financial Time Series
Stock price data is a typical example of time series, where future trends are predicted from past values (Merh, Saxena & Pardasani, 2010). Due to the dynamic and chaotic nature of the stock market, accurate stock price forecasting using financial time series is a challenging (Xiong & Lu, 2017). However, accuracy is essential for improving reasoning and decision making, and this is why these studies have been topical (Ralevic et al., 2014) and very important for managers and decision makers in various fields of science for decades (Khashei & Hajirahimi, 2018).
Real-time time series generally contain both linear and nonlinear correlation structures among the data used for analysis. ARIMA models are known for their significant prediction accuracy and flexibility in presenting several different types of time series. However, they are limited by assuming a linear form of related data, making them unsuitable for modeling complex nonlinear time series (Khandelwal, Adhikari & Verma, 2015), i.e. they are limited in recording nonlinear patterns (Zhang, 2003). That is why the techniques of different forecasting models have been combined in recent decades (Khashei & Hajirahimi, 2018), yielding different hybrid models that take advantage of linear and nonlinear models for more accurate forecasting, but achieve different results, as shown in the "Literature Review" section.

ARIMA model
For predicting linear time series data, the ARIMA model is widely used to better understand and predict stock price movements (Mondal, Shit & Goswami, 2014). Pai & Lin (2005) state that ARIMA is one of the most widely used linear models in predicting time series, and therefore the author also uses it in this paper. Statistical models such as ARIMA are based on assumptions that the market will not undergo sudden changes in the future, and consequently, the influence of external factors other than time factors will be neglected (Xiong & Lu, 2017). ARIMA models are quite flexible in that they can represent several different types of time series, ie. pure autoregressive (AR) time series, pure moving average (MA), and combined AR and MA (ARMA) series (Zhang, 2003). Therefore, some of the special cases of ARIMA models are: random-walk and random-trend models, autoregressive models, and exponential smoothing models (i.e., exponentially weighted moving averages) (Banerjee, 2014).
Financial time series can be stationary, with a constant mean value and an autocorrelation structure (Zhang, 2003) or conversely -non-stationary (Merh, Saxena & Pardasani, 2010). A time series that must be differentiated to be stationary is said to be an "integrated" version of the stationary series (Banerjee, 2014). The letter "I" between AR and MA means "Integrated" and reflects the need for refinement to make the series became stationary (Makridakis & Le Hibon, 1997), while "AR" in the model name indicates automatic regression and "MA" the moving average (Xiong & Lu, 2017). ARIMA has the form (p, d, q) in which p, d, and q are nonnegative integers denoting: • p -a number representing the autoregressive parts of a data set (Mondal, Shit & Goswami, 2014), ie. order of autoregressive terms (Jiang & Zhang, 2019). • d -a number representing the integrated parts of the data set (ie, the number of non-seasonal differentiations (Banerjee, 2014)) and • q -a number representing moving average parts of a data set (Mondal, Shit & Goswami, 2014).
In the ARIMA model, the future value of the variable should be a linear combination of the previous values and errors, expressed through the following equation: ytis original value, tis random error in time t, ∅_i and θ_jare coefficients, p and qnumbers representing the AR and MA part of the model (Pai & Lin, 2005).
If q = 0, then the ARIMA equation becomes an autoregressive (AR) model. When p = 0, the model is reduced to a moving average (MA) model (Wang, Zou, Su, Li, & Chaudhry, 2013). For unstated time series data to become a stationary series, it is necessary to differentiate the time series d times. (Merh, Saxena & Pardasani, 2010).
ARIMA modeling procedures are determined through the Box-Jenkins framework of model development in four iterative steps (Khashei & Hajirahimi, 2018): • Identification of the degree of difference by visualizing the time series graph, to detect whether the time series data is stationary or not, and to transform them into stationary ones in the case of nonstationarity (Pai & Lin, 2005). Real values of p (auto regression number), d (differentiation number) and q (moving average number) are sought, using the proposed automated estimation tools (Khashei & Hajirahimi, 2018).
• Model evaluation and diagnostics. Automatic model parameter estimation is done using the autocorrelation function (ACF) graph and the partial autocorrelation function (PACF) (Pai & Lin, 2005). The behavior of the graph indicates the models to which they correspond. If the ACF values on the graph fall slightly and the values on the PACF graph drop sharply after the p lag, this indicates the AR (p) model. The reverse situation indicates the MA (q) model, while similar behavior of the ACF and PACF graphs through a slight decrease in values indicates the ARMA (p, q) and ARIMA (p, d, q) models (Banerjee, 2014). These two functions serve as basic tools to identify the order of ARIMA (p, d, q) of the model from the sample data (Khashei & Hajirahimi, 2018). The order p and q are decided by ACF and PACF, respectively, by ACF determining how the series correlates with itself in different lags, and PACF is reflected as a series regression against its past lags, while order d is provided by the degree of differentiation (Xiong & Lu, 2017). If the Q-statistics and the correlogram show that there is no significant pattern in the residuals ACF and PACF, then the residuals of the selected model are white noise, ie. random errors (Adebiyi, Adewumi & Ayo, 2014) and the model is assumed to be appropriate. Identifying the right model is done through trial and error (Makridakis & Le Hibon, 1997), but also Expert Modeler in the SPSS 22 package can be used, which automatically compares the models and suggests the best ones (Clement, 2014). Diagnostic verification of model adequacy and prediction accuracy is the third step, where by checking several diagnostic statistics and residuals, the best model structure is selected. If the model is found to be inadequate, a new structure of the ARIMA model will be identified and all three steps will be repeated until the best structure is found (Khashei & Hajirahimi, 2018 (Khashei & Hajirahimi, 2018), root mean square error (RMSE), mean absolute percentage errors (MAPEs), mean percentage errors (MPSEs) (Merh, Saxena & Pardasani, 2010), and by comparing their values. The model with the smallest error is selected as the optimal one (Ralevic et al., 2014). Box and Jenkins state that each model that results in random errors is appropriate, and they recommend that simpler models (which have fewer parameters) should be chosen in case there are more appropriate models. If the model under test does not prove to be appropriate, the other model is considered, its parameters are evaluated, the randomness of the errors is checked, and so on until the optimal model is obtained (Makridakis & Le Hibon, 1997).
• Forecast. Once the optimal model is selected, it moves to the stage of predicting future movements of the index values using the mentioned model (Clement, 2014).

Methodology
Yousif & Elfaki (2017) find it more convenient to analyze daily data and study the general stock movement index, as the stock market is highly fluctuating and the index shows less fluctuation than individual stocks. This paper usesthe leading index of the Belgrade Stock Exchange -Belex15for analysis, whose value is calculated in real time and reflects the price movements of the most liquid Serbian stocks. The index basket currently includes the most liquid stocks of 10 companies (https://www.belex.rs/), and the composition of the Belex15 index adequately reflects the situation and circumstances in the Serbian transition market (Ralevicet al., 2014). The period used to create the model and predict future index movements includes the value of the Belex15 index as of 10/01/2014 to 12/28/2018. The price of the index is represented by the opening price, which is an important indicator of trading activity that day, especially for day traders interested in measuring short-term results, according to investopedia.com. The analysis was performed in SPSS 22.

Results of the analysis
The results are presented according to the steps used to build the ARIMA model, first through the identification of potential models, their evaluation, diagnostics and finally the forecast using the selected model.

Identification
To examine the nature of the data, the Durbin-Watson test was used (Banerjee, 2014). The resulting value of 0.015 indicates that the series is time dependent and that the time series analysis is suitable for this data series because it falls within the value 0 ≤ DW ≤ 1.5. The original data is shown in the graph ( Figure  1) to find out whether the data is stationary or not. The graph above clearly shows that there is an upward trend in the Belex15 index, and it is necessary to differentiate the data to make it stationary and suitable for further analysis. Also, ACF and PACF indicators are shown on the original data in Figure 2.
(JPMNT) Journal of Process Management -New Technologies, International Vol. 8, No 2, 2020. 7 www.japmnt.com According to the table used for the analysis, which is explained in Chapter 3 (Clement, 2014), this model suggests that a component of the AR (p) model is necessary.
After the first differentiation of values, a stationary series of data was obtained with the following characteristics ( Figure 3):  Management -New Technologies, International Vol. 8, No 2, 2020. 8 www.japmnt.com As the values on the ACF and PACF correlograms behave similarly, we can conclude that it is necessary to use both the AR (p) and MA (q) model (Mondal, Shit & Goswami, 2014), with first order differentiation (d), which leads us to the conclusion that the ARIMA model (p, d, q) is a good starting point for the analysis. Given that it was necessary to differentiate the data and that we used first-order differentiation (d = 1), we exclude the ARMA model from the analysis of potential optimal models. After differentiating, we can read significant backlogs from the correlograms, which in this case is one log in place 11. Therefore, potential matching models can be: ARIMA (0,1,11), ARIMA (11,1,0) and ARIMA (11,1,11).

Assessment
After evaluating all potential models, the data presented in Table 1 were obtained.

Diagnostics
For the statistical fit of the model, stationary R 2 , R 2 , RMSE, MAE, and normalized BIC were used, according to the paper (Clement, 2014). R 2 provides an estimate of the proportion of total variation in the sequence explained by the model, the higher the value, the better the fit of the model (highest value 1.0). Stationary R 2 compares the stationary part of the model with the simple mean. Its positive values indicate that the model is better than the base model, while negative values say the opposite. The RMSE is a measure of how much the real values differ from the values predicted by the model, and as this is a measure of error, it is necessary to keep this value as low as possible. The MAE shows the mean of the absolute values of the forecast errors. The normalized BIC is a general measure of the total fit of a model based on the mean square error that penalizes a large number of parameters in the model by eliminating the advantage of such parameters, making the statistics easy to compare among different models. In addition, Ljung-Box Q statistics were used to test the randomness of residual errors in the model, where the more random errors are -the model is better. The degree of freedom (D.F.) indicates the number of parameters that can vary in the estimation, while a significance value (Sig.) of less than 0.05 indicates that the residual errors are not random; that there is a structure in the observed set of values that is not covered by the model, and that better experimentation with other responsive models would be required (IBM Knowledge Center, 2012).
Also, the ACF and PACF correlogram values for residuals of all models, whose values are within the 5% significance interval, have been studied, which means that a particular model is complete (Wang et al., 2013). When analyzing the values in Table 1, the ARIMA model (11,1,11) was excluded due to the overuse of parameters, so Ljung-Box statistics did not produce results, and therefore the choice fell on one of the remaining two models. Simple models ARIMA (0,1,11) and ARIMA (11,1,0) have slightly different RMSE, MSE, R 2 , stationary R 2 , Normalized BIC, but according to Ljung-Box statistics model ARIMA (0,1, 11) stands out in relation to the other, and this model was selected as adequate for this time series. www.japmnt.com Expert Modeler (Clement, 2014) was used to check the selection, which also suggested the mentioned model as optimal. The ACF and PACF correlograms of the residuals of the selected model are shown in Figure 4.

Prediction
Using the selected model, author obtained forecast data for the movement of Belex15 index values for the next 10 working days, shown in Table 2, together with the upper and lower 95% confidence interval limits. If we compare the selected performance of the ARIMA (0,1,11) model with previously used models to analyze the movement of the Belex15 index applied by the author over the same dataset (Petrovic, 2019), we see that this model shows a similar RMSE and MAE error as the multiple linear models regressions with a time window of 5 (RMSE = 4.88, MAE = 3.48) and 10 days (RMSE = 4.94, MAE = 3.55). However, when compared to the performance of neural network models, ARIMA (11,1,0) shows a greater RMSE error than neural network models with a five-day time window NN1 (RMSE = 2.80) and a tenday time window NN2 (RMSE = 3.08) (Petrovic, 2019).

Conclusion
Among the initial three models extracted on the basis of the four steps of the ARIMA methodology, the author considers that the optimal choice is the ARIMA model (0,1,11), because it showed a better performance than the ARIMA model (11,1,0) after the model (11,1,11) was discarded because of overused parameters. The selected model showed good statistical suitability and the possibility of a satisfactory short-term prognosis of the Belgrade Stock Exchange index, although it showed a higher RMSE compared to the nonlinear model applied to the same data.
The model is limited by its inability to capture economic variables, such as the impact of government monetary or fiscal policy, which can lead to greater fluctuation in index values, as well as starting from the assumption that the data are linear, which in reality may not be the case. (Banerjee, 2014). Makridakis & Le Hibon (1997) state that the ARIMA method is not reliable for predicting time series in the domain of economic and business application, where there is a high level of randomness.