FORECASTING MAIZE YIELD IN THE REPUBLIC OF SERBIA BY USING BOX-JENKINS METHODOLOGY

As food supply is one of the most important issues for national security, forecasting agricultural production represents a necessity for every country. Forecasting yields and production volume is a very complex task that requires both application of formal statistical methods and assessment of experts. Considering not only the significant role of maize in the immediate diet and processing industry in the Republic of Serbia, but also its absolute dominance in the structure of sown areas, the analysis of the time series and projections of future maize yield were carried out. Maize yield projections obtained through BoxJenkins methodology, for 2017 and 2018, are 4.97 t/ha and 7.01 t/ha. Comparing the projected with realized yield in 2017 (for which there is available data), it was noticed that the model’s forecast indicated a smaller reduction than actually realized. According to the results of the conducted study, it is clear that although the time series model cannot anticipate precise yield quantities, it can certainly be useful in terms of predicting future tendencies in maize yield oscillations. © 2019 EA. All rights reserved.


Introduction
The important contribution of agriculture in the economic structure of the Republic of Serbia primarily comes from rich land resources and favorable natural conditions for organizing agricultural production. Almost two thirds of the value of total agricultural output in 2017 comes from crop production. Arable fields and gardens make 75.9% of In the immediate diet, maize can be found in the form of grains or as a grinding product in the form of maize flour or meal, while the major part used in human nutrition is consumed through products such as oil, corn flakes, popcorn, sugars etc. However, the most of maize production is used as livestock feed, either as processed grains in form of concentrated fodder mixtures or overhead biomass as a bulky feed. In addition, maize products found their purpose in the textile, paper, alcohol, pharmaceutical and other industries.
In the last decade, the average area under maize in the Republic of Serbia is about 1.15 million hectares with an average production of 5.7 million tons and the average yield around 5.3 t/ha. Since maize production takes place on nearly one fifth of the utilized agricultural area the Republic of Serbia, it represents the most widely spread field crop intended not only for domestic consumption, but also for export.
Bearing in mind the strategic role of maize from the aspect of food security mainly through ensuring the needs of the domestic processing industry, primarily the fodder industry, it is necessary to plan the production volume. In this context, time series analysis is used as one of the leading tools for monitoring yield oscillations and for forecasting future ones.

Materials and methods
In order to anticipate future trends, data concerning production volume and realized yields of maize over the past seven decades, was obtained from the documentary materials of the Statistical Office of the Republic of Serbia and was processed using the program EViews10. It is one of the leading software for complex analysis of statistical data, their graphical representation, simulation of econometric models, analysis of time series and forecasting.
Statistical techniques of time series analysis have been widely disseminated in the literature, and there is a great variety of circumstances and researches in which they can be used, especially in studies involving economic decisions, such as: national and international economies behavior, stock market, price forecast, supply and demand for agricultural products, among others (de Oliveira et al, 2012).
Many of parameters that need to be monitored and forecasted are influenced by factors which occurrence and implications are not easy to predict. Agricultural production, primarily crop production, bears significant risks in terms of achieving production  Vogel et al., 2019). Therefore, the Box-Jenkins methodology was used for analyzing previous movements in maize yields in order to anticipate future ones.
The Box-Jenkins methodology includes a set of procedures for identifying and estimating time series models within the class of autoregressive integrated moving average models -ARIMA(p,d,q), along with autoregressive models -AR(p), moving average models -MA(q) and autoregressive moving average models -ARMA (p,q), as special cases of the general class. It refers to the iterative application of three stages until stage three produces a model adequate for generating forecasts (Figure 1). Source: Kovačić, 1995. The first stage in the Box-Jenkins methodology implies detecting whether the time series is stationary, based on plots of the data, autocorrelation and partial autocorrelation functions, or it needs differencing. When the autocorrelation function dies down slowly, and the partial autocorrelation function is cut off after the first lag, this suggests that the series should be differenced (Kovačić, 1995). Alongside examining graphical representations of the time series, one of the formal statistical procedures should be used to establish if the time series possesses a unit root and needs to be differenced, which in this study was Dickey-Fuller test.
By determining the values of p (autoregressive order), d (differencing) and q (moving average order), the trial model is defined, which can be changed through the iterative process until the final one is formed.
When choosing the optimal number of parameters in the model, three information criteria are usually used: Akaike information criterion (AIC), Schwarz information criterion (SIC) and Hannan-Quinn information criterion (HQ). In this study, choice between models with different number of explanatory variables was made based on the value of the information criteria, whereby a model which minimizes the value of all three information criteria was chosen.
The second stage in the Box-Jenkins methodology implies the evaluation of the parameters of the model identified in previous phase. This means that an assessment of the mean, variance and coefficients in the model is carried out. The most common methods used in this stage are nonlinear least squares method, maximum likelihood estimation and method of moments.
In the third stage, the main objective is to check the adequacy of the previously estimated model. This involves testing the statistical significance of the estimated coefficients and the assumption that the residuals of the estimated model are exhibiting white noise-like behavior. Diagnostic testing is carried out using the Jarque-Bera test (JB) and the Box-Ljung Q-statistic.
If the residuals of the estimated model are not exhibiting white noise-like behavior, it is necessary to return to the first phase and identify a new trial model. However, if the selected model is adequate, it can be used to generate forecasts.

Results and discussion
Stationarity check with graphical procedure The first step in the model identification phase involves analyzing a graphical representation of the time series to determine whether it is stationary or non-stationary ( Figure 2). Based on the graphic representation, it is possible to make only rough estimation, so in order to be sure whether the time series is stationary or not, formal statistical testing should also be conducted. Based on the presented trend of maize yield in the period for which the data is available (1947 -2017), it can be noticed that the time series has a structural break, i.e. it shows two patterns of behavior before and after 1981.
Structural break (or shock) represents one or more observations which are not in accordance with the previous data set, as a result of an intervention or exogenous event. These events can affect the time series movement in two ways. On the one hand, influence of the exogenous event, such as the reform of economic or agrarian policy, can produce changes that affect the course of the time series in the following period, and on the other hand, adverse weather conditions such as drought or flood, can cause changes in only one observation.
Identifying the structural break in the time series is particularly important because the formal unit root test, the Dickey-Fuller test, is sensitive to its existence (Mladenović, Nojković, 2015). A structural break which results in permanent movement changes in the trend stationary time series, in DF test application, may suggest a false existence of a unit root. Also, when a structural break has a one-time impact, in the first difference of the time series with a unit root, the application of the DF test may falsely suggest that the series is stationary. Bearing this in mind, it is recommendable to analyze the series before and after the structural break occurrence in 1981. The movement of maize yields in the period before the established shock is presented in the following figure. Judging by the graphical representation of the maize yield in the period 1947-1981, the time series seems to be trend stationary with a significant trend increase. This means that in the unit root test application the exogenous regressors consist of a constant and a trend. In the period after 1981, it is noticed that the movement of the maize yield has changed ( Figure 4). http://ea.bg.ac.rs  By comparing the previous two graphical representations, it can be concluded that there is a difference in the movement of the maize yield in the two observed subperiods. Unlike the trend stationary series present in the period 1947-1981, the time series in the period 1982-2017, exhibits movement without a trend component, but with oscillations around the non-zero mean value (4.62 t / ha). The absence of the trend component means that in this case, the exogenous regressors within the unit root test contain only a constant.

Stationarity check with test procedure
As the graphical representations cannot be solely used for providing reliable conclusions about whether the time series is stationary or not, the unit root test must also be used to increase objectivity.
In order to establish the presence of the unit root, as previously mentioned, in this study, the DF test was applied to both of the observed subperiods. The null hypothesis (H 0 ) that there is no unit root was set against the alternative (H a ) that the series have one unit root. Results of the DF test were obtained by using the least squares method and compared with test critical values (Table 1). If DF test statistic is lower than test critical value, the null hypothesis should be rejected. Obtained results show that, in both subperiods, calculated DF test statistic (-6.491845 and -5.491698) is considerably lower than the critical values at all three significance levels, meaning there is sufficient evidence to reject the null hypothesis of the unit root existence. In accordance with the previous finding, it can be claimed that the maize yield series in the period 1947-1981 is stationary with a deterministic trend, and it is stationary with oscillations around non-zero mean value in the period 1982-2017.
Considering the previously established shock in the time series, as well as the different movement of the time series in the two observed subperiods, it was necessary at this stage of the research to decide which period was more adequate for further analysis and projection.
The first observed subperiod is characterized by the improvement of the technological process and finding optimal ways to maximize utilization of yield potential of maize hybrids. The introduction of hybrids in maize production in our country began in the 1950s, and soon after, in 1961, the first domestic hybrids appeared (Tabaković, 2012). In the second subperiod, maize yield movements are mainly under the influence of unstable economic and weather conditions. Great oscillations of maize yield, from year to year, are conditioned by small investments in this production, which is most often under conditions of natural water regime, so that frequent droughts significantly influence the yield and grain quality (Jaramaz, 2015).
Since, in a short period of time, revolutionary changes cannot be expected, both in terms of increasing the yield potential of maize hybrids and in controlling objective production conditions, the second subperiod was chosen as more appropriate for further analysis.

Identifying the model order (p, q), estimation and adequacy check
Once it has been established that the time series is stationary, the next stage in the Box-Jenkins approach is to identify autoregressive and moving average order.
As stationary series is completely specified by its mean, variance, and autocorrelation function, the most subjective task in Box-Jenkins approach is to identify an appropriate model from the sample autocorrelation function.
For an AR(p) model, the sample and partial autocorrelation plots show that the autocorrelation function has an exponential or sinusoidal decay, while the partial autocorrelation function cuts off after lag p. For a MA(q) model, the sample and partial autocorrelation plots show that the autocorrelation function cuts off after lag q, while the partial autocorrelation function has an exponential or sinusoidal decay. In an ARMA(p,q) model, both autocorrelation plots show the tendency to decay (mixture of exponential decay and damped sine-waves) after the first q-p lags for the autocorrelation function and after p-q lags for the partial autocorrelation function (Kovačić, 1995).
In order to identify the order p and q, the sample and partial autocorrelation functions are calculated and graphically presented (Figure5). By studying the two autocorrelation plots, it was found that there is no autocorrelation, although the production of maize to a certain extent depends on the production volume achieved in the previous period. This is the reason why autoregressive parameters at lags 1, 2, 4, 5 and 7 were selected as explanatory variables.
The assessment of the coefficients has been made taking into account the existence of structural breaks in the data. Based on graphical representation of the time series in the analyzed period, two sharp declines in maize yield were recorded in 2000 and 2012, as a result of the extreme drought. In order to control the effects of these two events, two artificial variables have been introduced into the model (V 1 and V 2 ) which take the value of 1 for 2000 and 2012, and value 0 for all other observed years.
Considering the sample and partial autocorrelation functions, determined structural breaks and the fact that the time series is a stationary, the ARMA model was estimated using the least squares method containing AR (1), AR (2), AR (4), AR (5) and AR (7), as well as artificial variables V 1 and V 2 ( Table 2). Agriculture, Year 66, No. 2, 2019, (pp. 525-540), Belgrade Source: Created by the author based on data analysis in EViews10

Economics of
As already stated, in the Box-Jenkins procedure, the defined trial model can be changed through the iterative process until the final one is formed. So, at this point of the study, it was necessary to establish if any alterations can be made leading to enhanced model. Based on the obtained results, it is determined that the estimated coefficients of the artificial variables introduced to control the effects of the structural breaks in 2000 and 2012, are negative as expected. It is also determined that the coefficients for the variables AR (1) and AR (4) are not statistically significant at the 10% level, and therefore the exclusion of these variables from the model should be considered. So, as exclusion of the variables AR (1) and AR (4) may improve model's explanatory power, the second model is defined (Table 3).  , Year 66, No. 2, 2019, (pp. 525-540)

, Belgrade
By excluding the variables AR (1) and AR (4), all remaining variables in the model are statistically significant, 31.6% of variations in maize yield are explained by the model and all three information criteria have decreased (AIC from 3.03 to 2.99, SIC from 3.39 to 3, 26 and HQ from 3.16 to 3.08). According to these results, the second model can be considered as enhanced, and therefore used for the purpose of further analysis.
In order to be accepted as adequate for forecasting, the test results need to show that the model's residuals are normally distributed and uncorrelated. The residuals in the time series model represent what is left over after fitting a model, and for many (but not all) time series models, the residuals are equal to the difference between the observations and the corresponding fitted values (Hyndman, Athanasopoulos, 2018).
The Jarque-Bera test (JB) is used to examine how much the empirical distribution of the series residual deviates from the normal distribution. As normal distribution has a skewness of zero and a kurtosis of three, the following hypothesis were set: H 0 : E(r s ) 3 =0 and E(r s ) 4 =3 vs.
Under the null hypothesis of a normal distribution, the JB statistic is distributed as 2 χ with 2 degrees of freedom. The critical value for 2 χ distribution with 2 degrees of freedom, at the 5% significance level, is 5.99. If the value of the JB test statistic is lower than the critical value, it is considered that the residuals are normally distributed. The results for testing whether the series is normally distributed are shown in the following figure.  The calculated JB statistic (0.98) is considerably lower than the critical value, which indicates that the residuals do not deviate significantly from the normal distribution, When it comes to determining whether the residuals of selected model exhibit white noise-like behavior, it is necessary to analyze the sample and partial autocorrelation plots and conduct autocorrelation test. By observing autocorrelation plots it can be concluded that the residuals are not distinguishable from a white noise series, when all the autocorrelations and partial autocorrelations are small (Figure 7). For formal autocorrelation testing, the Box-Ljung Q-statistic is used to increase the precision in the decision-making process. The Q-statistic tests the joint hypothesis that the first K autocorrelations of the adjusted error terms are jointly zero: H 0 : ρ 1 =ρ 2 =…=ρ k =0 against the alternative that at least one of the first K autocorrelation coefficients of the residual series is different from zero. If the p-value is greater than significance level the null hypothesis of uncorrelated residuals cannot be rejected. Since the p-value for the Box-Ljung statistic is greater than 0.05, it can be inferred that there is no autocorrelation of the residual in the estimated model.
As it has been proven that the residuals of the rated model are normally distributed and uncorrelated, the model can be used for maize yield projections in the following period.

The maize yield forecast
The model identified, evaluated and fortified as adequate for forecasting through the Box-Jenkins methodology, was used to achieve the main goal of the conducted study, that is, to project maize yield in 2017 and 2018 ( Table 4). The maize yield projection for 2017 was made to enable comparison with the yield actually realized. Comparing the projected and realized maize yield in 2017, it can be noticed that the model's forecast indicated a decrease, but somewhat lower than the realized.
Compared to 2016, the obtained results predict that, in 2017, the maize yield will decrease by 31.9% (i.e. from 7.303 t/ha to 4.97 t/ha), while the forecasted yield for 2018 is 7.01 t/ha, which represents an increase of 41%.
Although the Republic of Serbia is characterized by favorable conditions for the cultivation of a wide range of crops, without frequent occurrence of temperature extremes, in some years there are still dry periods or excessive amounts of precipitation. Due to dry period incidence, even drought-resistant hybrids do not fully demonstrate their genetic yield potential. Negative impact of adverse weather conditions is reflected in the reduction of yield and grain quality, which causes great damage to producers through decline of the production cost-effectiveness. However, it is possible to adapt to or mitigate the effect of adverse weather if a forecast of the expected weather can be had in time (Sodha, Saha, 2016).
Depending on the drought intensity, maize yields can be reduced by 50% or even by 80% in extremely drought years. For these reasons, according to Filipovic ( Unfortunately, considering the percentage of irrigated areas, The Republic of Serbia is ranked very poorly in Europe, and that small percentage of irrigated areas is mostly under the fruit and vegetable crops in greenhouses and not so much under the field crops, such as maize. Weather conditions, specifically high temperatures and lack of precipitation, contributed to a significant reduction in maize yield in 2017, as conducted forecast indicated. This is why, according to Ralević (1985), only the exact methods of forecasting that include a priori expert information, provide a solid basis for decision-making when it comes to agriculture.

Conclusions
The application of Box-Jenkins approach for identifying, estimating and selecting the model suitable for predicting future maize yields, provided relatively satisfying results, since the forecasted decreasing trend of maize yield in 2017 was actually realized. However, there is a difference of about 20% between the forecasted and realized maize yield, which is quite substantial taking into account the extent of the area covered with maize on an annual basis.
The realized decrease in maize yield is due to long-lasting dry periods, given the fact that the largest part of this production takes place under direct influence of the weather conditions and without irrigation. This kind of developments usually has a severely harmful effect on the supply, prices and exports of essential agricultural products in the following period, which particularly increases the significance of planning both yield and total production volume.
Forecasting by using time series analysis cannot provide the exact quantities of future yields, but it can and should be used as a framework for forecasting future trends in their fluctuations. Therefore, it is important to ensure an adequate input base in the process of planning the development of maize production, in order to gain solid results through the statistical forecasting methods and experts' forecasts.
In addition, as time passes, new values of the time series are generated, which is why it is possible to determine the deviation from their formerly predicted values, as well as the necessary corrections for subsequent periods. In other words, it is possible to perform a revision of the forecast.