REMOTE SENSING MACHINE LEARNING ALGORITHMS IN ENVIRONMENTAL STRESS DETECTION-CASE STUDY OF PAN-EUROPEAN SOUTH SECTION OF CORRIDOR 10 IN SERBIA

The construction of the Pan-European Corridor 10 is one of the major projects in the Republic of Serbia, and it enters the final phase. A vast natural area suffered a significant change to complete the project and therefore is the existence of a need to monitor those changes. Nature requires adequate and accurate detection of environmental stresses which inevitably arise after implementation of such large construction projects. Conversely to traditional field monitoring of the environment, this paper will present the remote sensing method which includes usage of European Space Agency's Sentinel 2A optical satellite data processed with different Machine Learning algorithms. An accuracy assessment is performed on land cover map results, and change detection carried out with best resulting data.


INTRODUCTION
One  of the major projects of the Republic of Serbia funded by the World Bank (WB), European Investments Bank (EIB), Hellenic Plan for the Economic Reconstruction of the Balkans (HiPERB) and the Republic of Serbia, is the construction of the main branch of Pan-European Corridor 10. The corridor connects Salzburg in Austria and Thessaloniki in Greece through Ljubljana in Slovenia, Zagreb in Croatia, Belgrade, and Niš in Serbia, Skopje, and Veles in Macedonia ( Figure 1). In Serbia, the south part of Corridor 10 is called the "Highway Е75project SOUTH" and it is presented and constructed as the motor road at this point (Koridori Srbije, 2017).  The construction zone of this scale indubitably has a significant impact on the environment. A proper monitoring is crucial to conserve the nature and mitigate the environmental stress. Considering that technology has advanced, we are going to use the achievements of remote sensing and its methods to monitor the changes that have occurred during the construction of Corridor 10. Further, the change detection of the land cover will be performed to present the changes for the monitored period. Area of interest is selected within the area that is under active construction and covers 1.095,4 sq. km (Figure 3).

Materials and methods
Remote sensing technology is employed to achieve the goal of this paper with the contemporary methodology that employs the Machine Learning (ML) algorithms (Canziani et al., 2008;Mas & Flores 2008;Jensen et al., 2009;Duro et al., 2012;Lary et al., 2016).
Sentinel 2 satellite imagery was obtained using Copernicus Sci Hub (Copernicus Open Access Hub, 2017) as starting data for the analysis. Sentinel 2 product consists of the granules that represent the particular region. The granule comes with 13 different bands where three different ground resolution bands are present: 10 m, 20 m, and 60 m. 10 m bands are: visible Blue (B), Green (G), Red (R), and Near InfraRed (NIR). 20 m bands are three Vegetation Red Edge bands, Narrow NIR and two Short Wave InfraRed (SWIR) bands. 60 m bands are Coastal Aerosol, Water, Vapour and SWIR Cirrus band (Sentinel 2 MSI, 2017).
Two different Sentinel 2 products Level-2A were downloaded for 2017. Since there were cloudy parts in the research area, the mosaic was made using two different granules T34TEN date from 01.07. -31.07.2017. Remote sensing/ raster processing plugin for QGis was applied to perform the mosaicking tasks.
To perform the change detection for the research area, the same images from August 2016 were downloaded from the Copernicus Sci Hub, and sub-scene created. The image was cloud-free, and there was no need for mosaicking. The product was Level-1C, so the data was processed to Level-2A using SNAP (Sentinel Application Platform) toolbox software (ESA STEP, 2017), which took more than 13 hours to complete. Sentinel 2 products have multiple processing phases: -Level-0 and Level-1A&B products are in preprocessing phase and not available to users; -Level-1C processing uses the Level-1B product and applies radiometric and geometric corrections (including orthorectification and spatial registration); -Atmospheric correction is applied to Top-Of-Atmosphere (TOA) Level-1C orthoimage products, and a scene classification is presented as the Level-2A product. Bottom-Of-Atmosphere (BOA) corrected reflectance product is Level-2A with main output as an orthoimage. Additional outputs are Aerosol Optical Thickness (AOT) map, a Water Vapour (WV) map and a Scene Classification Map (SCM) together with Quality Indicators (QI) for cloud and snow probabilities at 60 m resolution (Sentinel 2 MSI, 2017). Sentinel 2 bands used to complete the analysis are Red, Green, Blue and Near Infra-Red bands with 10m ground resolution.
Pixel-based Machine Learning (ML) algorithms were used to produce the land cover map of the area of interest. The most common three ML tasks are Regression, Classification, and Clustering.
Regression is employed as supervised learning task for modeling and predicting variables, where we have numeric true ground values for the research area. There are different regression algorithms, such as: -Linear Regression (works when there are linear relationships between dataset variables); -Regression Tree or Decision Trees repeatedly splits the dataset into separate branches and maximize the information gain. This allows the algorithm to learn nonlinear relationships; -Deep Learning algorithm applies to multi-layer neural networks to learn extremely complex patterns using convulsions and drop-out mechanisms, and others; -Honorable Mention (Nearest Neighbors) save each training observation. Further, they make predictions for new observations as they search for similar training observations and join the values (Elite Data Science, 2017). Classification, as supervised learning task, is used in this paper to model and predict land cover categories as the ML algorithms can predict a class. Different classifications were used in this article to obtain the best possible accuracy of the data: -Classification Trees is employed in Random Forest; -Gaussian Mixture Model (GMM) take on that data points are generated from a mixture of a limited number of Gaussian distributions with unfamiliar parameters (Scikit learn, 2014).

K-Neighbors Classifier
where the learning is based on the k nearest neighbors of each query point. k is an integer value specified by the user (Scikit learn, 2014).
The creation of a land cover map from BOA processed Sentinel 2 data required a ground training samples. To obtain such areas and create necessary vector file as training material, historical google maps were employed using different sources and plugins for QGis. Seven different classes recognized for both 2016 and 2017 and consist of 175 and 164 polygons respectively. Two attributes created, as integer and text. Further, prepared subscene for each year was processed using dzetsaka ML plugin for QGis.
The accuracy assessment was performed using training sample polygons in dzetsaka and SCP plugin for QGis.
Confusion matrix was created and presents overall accuracy and kappa hat.
The land cover change was performed using SCP plugin in QGis.

NUMERICAL RESULTS
After applying the algorithms, three different land cover maps for each year were created (Figure 4).
Accuracy assessment for created land cover maps is presented in Tables 1-3. As it can be seen, ML algorithms gave very decent results where Random Forests goes up to 100% of accuracy.    Accuracy assessment results demonstrate how those ML algorithms execute the classification. The best result is given by the Random Forest algorithm with perfect accuracy of 100% for 2016 and 96.35% for 2017. In next part of this research, Random Forest land cover map will be used for the final analysis. Classification results are presented in Table 4: The results show that two classes are dominant with more than 90% of the research area: Forest with 65.9% in 2016 and 67.6% in 2017 and Agriculture with 27.9% and 25.1% respectively. Percentage of change is presented in Table 5. Change detection data in table 5 confirms the table 4 data and presents how much each class has changed. The highest increase has the Artificial bare soil (where our primary goal of this work belongs -Corridor 10 under construction), and Pasture classes versus the Bare Soil, Agriculture, and Artificial classes which decrease in area percentage cover. Figure 5 shows the difference in the northern part of the research area where the construction of Corridor 10 is in its full swing.

CONCLUSION
As table 5 is presenting, the class of interest in this research is within Artificial bare soil which presents the construction area of new Corridor 10. It can be seen that there is an increase of the area covered by this class which indicates that in one year there were changes in the environment. Since the land cover is still presented with same class and did not change into an Artificial area where constructedpaved highway belongs, we can conclude that the motorway is still under construction. This data acquired using remote sensing analysis of Sentinel 2 satellite imagery can be of great help in monitoring changes of the environment and big construction projects. Since the satellite data are widely accessible and have satisfying ground resolution with low, or no cost, we cannot exclude the remote sensing techniques from the environmental research, but we must expand the knowledge and capabilities provided. Random Forest machine learning algorithm used in this paper confirms that the classifying algorithms have advanced to the level when they can be of great help to the environment analysts. High accuracy of classified data obtained using Classification Tree algorithm gives new perspective to remote sensing. Furthermore, different machine learning algorithms (Random Forest, Gaussian Mixture Model, K-Neighbors Classifier, and other) along with the Artificial Neural Networks and Object Based Image Analysis (OBIA) classification are in the focus of remote sensing professionals and researchers, while rapid development and improvement of the algorithms is in progress.
With this methodology, it is possible to perform a broad spectrum of analysis, such as environmental stress detection (landslides, wildfires, flooding, etc.) or land cover map creation and other, with the very high percentage of accuracy while we save time and money in the process that used to last much longer.