Accuracy of Wind Speed Predictability with Heights using Recurrent Neural Networks

Accurate prediction of wind speed in future time domain is critical for wind power integration into the grid. Wind speed is usually measured at lower heights while the hub heights of modern wind turbines are much higher in the range of 80-120m. This study attempts to better understand the predictability of wind speed with height. To achieve this, wind data was collected using Laser Illuminated Detection and Ranging (LiDAR) system at 20m, 40m, 50m, 60m, 80m, 100m, 120m, 140m, 160m, and 180m heights. This hourly averaged data is used for training and testing a Recurrent Neural Network (RNN) for the prediction of wind speed for each of the future 12 hours, using 48 previous values. Detailed analyses of short-term wind speed prediction at different heights and future hours show that wind speed is predicted more accurately at higher heights.For example, the mean absolute percent error decreases from 0.19 to 0.16as the height increase from 20m to 180m, respectively for the 12 future hour prediction. The performance of the proposed method is compared with Multilayer Perceptron (MLP) method. Results show that RNN performed better than MLP for most of the cases presented here at the future 6th hour.


INTRODUCTION
Exponentially growing population and at the same or even at higher pace increasing power demands are the concerns of people from all walks of life. Renewable energy penetration into the energy mix and wind in particular is increasing globally due to its environmentallyfriendly nature, fast technological development, commercial acceptance, ease of operation and maintenance, and competitive cost. Additionally, the deployment of wind power projects reduces the dependence on fossil fuels and consequently cut down the greenhouse gases (GHG) emissions into the local atmosphere. As a sign of progress in wind power sector, in 2020, the cumulative global wind power installed capacity reached 743 GW with new addition of 93 GW [1,2]. At present, there are more than 90 countries cont-ributing towards wind power capacity build up inc-luding 9 countries with more than 10 GW and 29 more than 1 GW of installed capacities globally.
Wind speed, among all the meteorological para-meters, is highly fluctuating both in temporal and spatial domains. It changes with time of the day, month of the year, and day of the year. This fluctuating nature of wind speed creates an uncertainty in the availability of continuous power and more importantly the stability of the grid. Hence, understanding the variability of the wind speed at a location with time is critical for quality and magnitude of wind energy yield, which is directly proportional to the cube of wind speed. It simply means that proper understanding of the wind speed variations based on long-term historical wind data and its future trends is a pre-requisite for the success of huge investments. Hence, when planning the deployment of a farm at a site, an indispensable task is to conduct on-site wind speed measurements at least for one complete year (the longer the better) and analyse it to extract information on the variability of the wind [3][4][5][6][7][8][9][10][11][12]. The variability of wind covers a wide spectrum of timescales starting from seconds, hours, days, months, year, and to several years. So, predicting the wind speed accurately ahead of time, few hours to days, is important for power producers, grid operators, energy managers, and lastly the consumers [13][14][15]. Advanced but accurate knowledge of wind speed availability ahead of time can be utilized in applications, such as wind power dispatch planning, power quality, grid operation, reserve allocation, and generation scheduling.
Artificial intelligence techniques such as Artificial Neural Networks (ANN) [9], Convolution LSTM Networks [16], neuro-fuzzy systems [17], support vector machines [18], long-short term memory networks [19], Particle Swarm Optimization (PSO) [20],modes decomposition based low rank multi-kernel ridge regre-ssion [21], Gaussian process regression combined with numerical weather predictions [22], Singular Spectrum Analysis and Adaptive Neuro Fuzzy Inference System [23], optimal feature selection and a modified bat algorithm with the cognition strategy [24], and spatial model [25] have been applied to capture the nonlinear trend of the wind speed data series. Since early 2000, the trend of using hybrid methodologies has emerged in the literature in which more than one models is combined to achieve better forecasts of wind speed in future and spatial domains [26][27][28]. These modern machine learning methods are very useful and provide relatively better estimates both in time and spatial domains as can be seen from wide ranging applications like perfor-mance prediction of thermosiphon solar water heaters [29], analysis of absorption systems [30], sizing of pho-tovoltaic systems [31,32] ground conductivity map generation [33], and solar radiation forecasting [34].
Akçay and Filik [35] developed a framework based on data de-trending, covariance-factorization via subspace method and one and/or multi-step-ahead Kalman filter for the prediction of wind speed. The numerical experiments on the real data sets showed that the wind speed forecast particularly using multi-step-ahead filter outperformed persistence model-based predictions. In another study, Filik and Filik [36]used ANN based models in conjunction with weather parameters like ambient temperature and pressure and found good agreement between the measured and predicted values of wind speed. Santamaría-Bonfil et al. [37] predicted wind speeds 1-24 hours ahead using hybrid metho-dology comprised of Support Vector Regression and showed better forecast compared to persistence and autoregressive models. Hu et al. [38] introduced deep learning neural network technique to predict the wind speed and showed that the proposed approach reduced significantly the error between the predicted and measured values.
Kang et al. [39]proposed a hybrid Ensemble Empirical Mode Decomposition (EEMD) and Least Square Support Vector Machine (LSSVM) model to improve short-term wind speed forecasting precision. The results showed that the proposed hybrid model outperformed some of the existing methods such as Back Propagation Neural Networks (BP), Auto-Regressive Integrated Moving Average (ARIMA), and combination of Empirical Mode Decomposition (EMD). Liu et al. [40] used combination of Secondary Decomposition Algorithm (SDA) and the Elman neural networks and showed that the hybrid model performed better than the multi-step wind speed predictions. Wang et al. [41]showed that Least Square Support Vector Machine and the Markov hybrid model performed better than other models for wind speed prediction. Marović et al. [42] proposed ANN based wind speed prediction model for implementation in the early warning system to announce the possibility of the harmful phenomena occurrence due to winds, which proved to be accurate in terms of alerting the community ahead of time due to bad wind conditions. Shukur and Lee [43]used artificial neural network and Kalman filter hybrid model to address the nonlinearity and uncertainty issues and reported to be accurate in comparison with measured values. Jianzhou et al. [44], used support vector regression combined with seasonal index adjustment and Elman recurrent neural network techniques and obtained relatively better forecast compared to other models.
Koo et al. [45]evaluated the accuracy of the windspeed prediction using artificial neural networks in terms of correlation coefficients between actual and simulated wind-speed data for plain, coastal, and mountainous areas. The study concluded that the geographical location played important role in prediction accuracy of wind speed. For hourly prediction, Wu et al. [46] integrated single multiplicative neuron model with iterated nonlinear filters for updating the wind speed sequence accurately. The results indicated better performance of the proposed model compared to autoregressive moving average, artificial neural network, kernel ridge regression based residual active learning and single multiplicative neuron models. Zhang et al. [47] used hybrid models (combination of empirical mode decomposition, feature selection with artificial neural network, and support vector machine) for short term wind speed prediction and found better results compared to single ANN, SVM, traditional EMD-based ANN and EMD-based SVM models. Doucoure et al. [48] used multi-resolution analysis of the time-series by means of Wavelet decomposition and artificial neural networks and achieved around 29% reduction resources without affecting the predictability. Based on wavelet, wavelet packet, time series analysis, and artificial neural networks, Liu et al. [49] developed three hybrid models [Wavelet Packet-BFGS, Wavelet Packet-ARIMA-BFGS and Wavelet-BFGS] and compared the performance with Neuro-Fuzzy, ANFIS (Adaptive Neuro-Fuzzy Inference Systems), Wavelet Packet-RBF (Radial Basis Function) and PM (Persistent Model). The results showed that the proposed hybrid models produced better results than the other models.Most of the above methods use wind speed measurements and predictions at lower heights, while in reality wind energy is generated at hub height. At lower heights, the wind is influenced by ground activities such as heat of the ground, near surface turbulences and human activities. However, at higher heights (more than 80m) these effects are minimized and better predictions are obtained. This paper utilizes machine-learning method to predict wind speeds at different heights and analyzes the predictability of wind speed with height. The paper is organized as follows: Section II discusses the methodology, while Section III is devoted to numerical experimental results. Section IV concludes the paper.

Recurrent Neural Networks Model
The main purpose of this paper is to analyze correlation of windspeed predictability and its measurement heights. Therefore, a proven WS prediction technique namely the Recurrent Neural Networks (RNN) is utilized to assess the WS predictability with respect to the measurement height. The RNN [50] is one of the NN architectures that represents the information processing performed by human brain by connecting layers of input and output variables using hidden units. In addition, the RNN also utilizes a feedback from one or more of the hidden units as input to calculate the next output. Figure 1 shows the architecture of the RNN. The input layer (X) is a vector from the past WS values. This paper uses the Elman model where the hidden layer (H = {h 0 ,… h N-1 }) is computed from a non-linear function of the weighted sum of the input layer and the value of the hidden layer from the previous samples. Mathematically, the output of hidden unit at th sample h n (k) is given by: where U denotes the matrix that connects input and hidden layer and b is the bias vector. Matrix W connects the hidden units to the value of units from the previous input sample. The output layer represents the predicted future WS values. The output of th sample y(k) is given by: where represents the output layer weights. This paper uses tangent-hyperbolic activation function given by:

Levenberg-Marquardt Algorithm for RNN
The Levenberg-Marquardt (LM) algorithm [51]is commonly used to train neural networks due to its speed and guaranteed convergence. Therefore, for neural networks with medium number of units and layers, LM algorithm is the best candidate for the training the RNN. The LM algorithm weights update (Δw) is given by: where J denotes the Jacobian matrix of the error function with respect to the weight vector of the RNN.
The scalar factor λ governs the step size that is decreased if the updates successfully minimize the error function. Otherwise, if the updates failed to reduce the error function, λ is increased. The error function is calculated using the difference of the actual values (y and the predicted outputs (ŷ ). In this paper, three performance measures are employed including mean absolute percent error (MAPE), root mean squared error (RMSE), and the adjusted coefficient of determination (R 2 adj ).These performance measures are calculated using the following equations: where N denotes the number of samples.Number of inputs is denoted by and the regular coefficient of determination (R 2 ) is given by where y is the mean of the actual data. The regular R 2 will always increase when more samples are included while R 2 adj corrects for this issue and provides a more solid outcome.

EXPERIMENTAL RESULTS
This paper utilizes hourly averaged WS data measured by LiDAR system for 90 days between April 2 nd and 30 th June 2017 where the data was measured at ten different heights namely 20, 40, 50, 60, 80, 100, 120, 140, 160, and 180 m above ground level (AGL). As shown in Figure 2, hourly averaged WS measured at closer to ground level tends to be slower due to friction from the surrounding terrain. The acquired dataset is further divided, where the WS data from the first 80 days is used for training and the remaining for testing. This paper provides WS prediction up to 12 hours ahead of time. For each height, 12 different RNN models were trained as a function of WS values at previous hours to exploit its temporal correlation. Based on the initial experimental results, WS values from 48 previous hours yield more accurate prediction at the future hours. Therefore, all models at each height use the same 48 previous hours WS values as inputs to estimate the WS at the 1 st to 12 th future hours. Each RNN model uses 48 inputs, 30 hidden units, and a single output. The LM algorithm with scalar factor λ = 3 and a maximum of 30 iterations exhibited the best performance based on the cross-validation analysis. The training is also terminated when the MSE as the cost function fails to improve more than 10 -7 after six iterations as an indication of convergence. As a comparative method, separate Multilayer-Perceptron (MLP) models were built for predicting WS at the ten heights at 6 th hour ahead.

Results and Discussions
The resulting values of the performance parameters such as MAPE, RMSE, and R 2 adj are summarized in Tables 1, 2, and 3 respectively for each hour and height. In general, the MAPE values tend to decrease as the prediction height increases (Table 1). For example, the MAPE was 0.15 at 20 m for 1 hour ahead prediction but decreased to 0.11 at 180 m. Similarly, the MAPE values increased with increasing prediction duration as can be observed from column 2 of Table 1. In early hours the MAPE was around 0.15 while for 12-hour duration it increased to 0.19. The RMSE (Table 2) and the adjusted coefficient of determination values (Table 3) do follow the trends of MAPE values presented in Table 1.
The measured and the predicted WS at 20m and 1 hour ahead are compared in Figure 3(a). The predicted WS are found to be in close agreement with the measured values and follow the trend quite closely.
The corresponding scatter plot between the measured and predicted WSs at hour 1, Figure 3(b), shows an adjusted coefficient of determination of 0.91. The predicted and the measured WSs at 12 hours show relatively poor comparison compared to that for 1 hour ahead of time predictions, Figure 3(c), but do follow the trend quite closely. The scatter diagram, Figure  3(d), resulted in R 2 adj value of 0.84. At 100m, the comparison between the predicted and the measured WS values at 1 hour (Figure 4(a)) is much better than that at 12 hours (Figure 4(c)). However, in both the cases the predicted WSs follow the trends of measured values closely. The scatter plots for 1 hour ( Figure  4(b)) and 12 hours ahead of time predictions show less scatter with R 2 adj value of 0.88 at 1 hour compared to that at 12 hours with R 2 adj value of 0.77. Similar comparisons are made at 140m and 180m in Figure 5 and 6, respectively. In all of these cases, it is confirmed that the comparisons between the predicted and measured values at hour 1 are much better than those at 12 hours ahead. This observation is further strengthened by the higher values of R 2 adj at 1 hour ahead of time predictions than those at 12 hours ahead.

RNN andMLP Models Performance Comparison
In this section, the performance of RNN and MLP methods based on the predictions of WS at 6 th future hour and at different heights is compared. The predicted WS values from the two methods are compared with the measurements.
The values of the performance measures RMSE, MAPE, and R 2 adj for both the methods are summarized in Table 4  The WS values predicted using the two methods are compared with the measured ones at 20, 80, 140, and 180m for the 6 th hour and are shown Figure 7. The corresponding scatter plots are also provided in this figure. It is evident from Figure 7(a, c, e, and g) that the comparisons between the predicted and measured WS values keep on improving with increasing height. The corresponding scatter plots shown in Figure 7 (b, d, f, and h) have also confirmed this fact. In these scatter plots, the R 2 adj values are found to be larger in the case of RNN methods based prediction compared to those based on MLP methodology. Furthermore, the adjusted coefficient of determination values tends to increase with height, which shows better predictions at higher heights.

Predictability Analysis of WS with Heights
This sub-section is devoted to the analysis of the predictability of WS with heights. The MAPE and R 2 adj values for WS predictions for each of 1 to 12 hours ahead at each height are compared in Figure 8 using RNN. It is seen that as the prediction period in future time domain increases, the MAPE values also increase (Figure 8(a)). In general, a slower increment is observed in the values of MAPE up to hours 6 and a bit faster at further longer time durations. It is also worth mentioning that as the height of prediction increases, the MAPE value decreases. The R 2 adj values remained above 0.85 at 180m predictions for all the future hours of prediction (Figure 8(b)). At 20 to 120m heights the R 2 adj values between the predicted and the measured WSs are seen to be between 0.8 and 0.9 up to hours 6 and then decreased faster beyond (Figure 8(b)). Figure 9 shows the variation of MAPE and R 2 adj with respect to heights using RNN. It can be observed that the performance measures MAPE and R 2 adj improve with heights.

CONCLUSION
An accurate knowledge of future wind speed is critical and also helpful for the estimation of available wind power which is important for utility grid planning and operation. Typically, wind speed measurements are performed at low heights (20-40m). Modern wind turbines operate at hub heights of 80m to 120m. For the first time, to the best of the authorsknowledge, this paper assesses the predictability of wind speed relative to heights. LiDAR device was deployed to collect hourly averaged wind speed data and machine learning method was used for short term prediction of wind speed. Recurrent neural networks (RNN) are used to predict wind speed during next at each of the 12 hours based on previous 48-hour values. Predicted future values from 1st to the 6th hours did not deviate significantly (at height 120m MAPE ranged between 0.13-0.16) compared to the 7th to the 12th future hours (MAPE is increased up to 0.18).