Short-Term Prediction of Wind Power Density Using Convolutional LSTM Network

Efficient extraction of renewable energy from wind depends on the reliable estimation of wind characteristics and optimization of wind farm installation and operation conditions. There exists uncertainty in the prediction of wind energy tapping potential based on the variability in wind behavior. Thus the estimation of wind power density based on empirical models demand subsequent data processing to ensure accuracy and reliability in energy computations. Present study analyses the reliability of the ANN-based machine learning approach in predicting wind power density for five stations (Chennai, Coimbatore, Madurai, Salem, and Tirunelveli) in the state of Tamil Nadu, India using five different non-linear models. The selected models such as Convolutional Neural Network (CNN), Dense Neural Network (DNN), Recurrent Neural Network (RNN), Bidirectional Long Short Term Memory (LSTM) Network, and linear regression are employed for comparing the data for a period from Jan 1980 to May 2018. Based on the results, it was found that the performance of (1->Conv1D|2->LSTM|1-dense) is better than the other models in estimating wind power density with minimum error values (based on mean absolute error and root mean squared error).


INTRODUCTION
The current status of fossil fuel consumption and threatening impacts of conventional energy sources on the environment have motivated the researchers towards focusing on increased extraction of energy from renewable resources. Renewable energy sources like tidal, hydropower, wind, solar, geothermal, etc. are being widely employed as alternate sources of energy. Among them, wind energy promises to be suitable for large-scale applications due to its ease of availability, instal-lation, environmental friendliness, and inexhaustibility [1,2]. Exploration of wind energy from offshore and onshore sites has attracted the attention of both resea-rchers and capitalists as a potential area of investment. This is fuelled by the ever-increasing demand for energy in variousproduction and service sectors [3]. As a rapidly growing source of energy, the current trend in wind energy extraction is expected to expand in the near future. Although this is one of the cheapest forms of energy, inappropriate placement of wind turbines may result in under-utilization of their capacity and can lead to huge loss of revenue. Thus, scientific evaluation of wind resources plays a vital role in providing secure wind energy utilisation and enhancement of the effici-ency of wind energy markets [4,5]. Some of the resear-chers have adopted novel techniques for determining the optimal location of the wind farms [6][7][8].
As a basic pre-requisite, the knowledge of wind power density is critical sincethe variation in wind frequency distribution can result in varying wind power densities for the same wind speed.Although various predictive models are available to estimatethe wind energy-associated parameters, there are a fewinherent limitations from a computational perspective. As an obvious response to the recent trends in artificial intelligence-basedmodelling techniques, there are various attempts to employ themfor simulation, estimation and optimisation of wind energy parameters. Taylor et al. [9] constructed density forecasts from weather ensemble predictions and statistical time series models. Weibull distribution-based Particle Swarm Optimisation (PSO) was reported as a useful tool in assessing the wind energy potential in Taiwan [10]. In anotherapproach, Monte Carlo simulation was used by Jeon and Taylor [11] to compute conditional kernel density for the VARMA-GARCH model. It is to be noted that most of thesemodels predict the wind power density based on the prediction of wind speed and direction functions. Bigdeli et al. [12] developed hybrid models to predict the wind power time series by combining Neural Network (NN) with Imperialistic Competitive Algorithm (ICA), Genetic Algorithm (GA), and PSO techniques. Using the PSO, Abductory Induction Mechanism (AIM), and the persistence (PER), Mohandes and Rehman [13] predicted the distribution of 12 hours wind speed in Saudi Arabia.
Towards deriving approapriate strategies for evolving hybrid solution systems, an Adaptive Neuro-Fuzzy Inference System (ANFIS) was employed to study the distribution of probability density functions (PDF) of wind speed and directions by Shamshirband et al. [14]. The results were in good comparison with the conventional wind distribution approaches such as Weibull, Frechet, Gumbel, and joint probability functions. A similar adaptive model was designed by Baskar and Singh [15] namely Adaptive Wavelet Neural Network (AWNN) and Feed-Forward Neural Network (FFNN) to successfully predict the wind speed distributions. A combination of Support Vector Machine (SVM), Artificial Neural Network (ANN), and Genetic Programming (GP) was used by Mohammadi et al. [16] to predict the monthly values of wind power density which is in good comparison with the prediction accuracy of Extreme Learning Machine (ELM). In another hybrid approach, the Beveridge-Nelson decomposition method was used to predict the wind power potential at the Xinjian Uygur autonomous region in China [17]. Based on the relevance of the SVM in optimization, they also employed an ant-lion optimizer for predicting the wind power based on the hourly wind speeddata. A combination of GA and ELM was used to predict the wind power by Wang et al. [18]. Another recent hybrid approach for the ANFIS models such as ANFIS-PSO, ANFIS-GA, ANFIS-DE (diffe-rential evolution) was attempted by Hossain et al. [19] for four different locations in Malaysia using the monthly and weekly wind power density values. An improved teaching-learning-based optimisation (iTLBO) with ELM was proposed by Xue et al. [20] to predict the wind power by incorporating the recursive feature elimination (RFE) method for feature selection.
As far as deep learning methods are concerned, manyresearchers are coming up withdeep learning techniques for forecasting wind speed [21][22][23][24][25][26][27]. However, only limited studies are available for forecasting wind power density using deep learning. Lu et al. [25] used encoder-decoder LSTM to predict short-term wind power. Xu et al. [28] employed anadaptive LSTM for short-term prediction of wind power. Yu et al. [29] adopted LSTM-EFG model for wind power prediction based on sequential correlation features. However, it is understood that studies pertaining to the comparison of statistical performance of thedeep learning techniques are lacking in the literature. Therefore, in the present study, we have attempted to compare five different nonlinear techniques, namely, convolutional neural network, dense neural network, recurrent neural network, bidirectional LSTM network, and linear regression to compare their accuracy in prediction so as to determine the most efficient deep learning method for forecasting wind power density.

Related works
Generally, there are three major types of forecasting models used for wind speed calculations, namely physical-based models, statistical models, and hybrid models. Physical models usually consider several parameters such as temperature, pressure, surface roughness, obstacles, etc. in the lower atmosphere for creating mathematical models of the atmosphere to predict the wind speed. Statistical models are based on the previously recorded data for forecasting the wind speed without the consideration of meteorological conditions. Hybrid models consist of both physical and statistical approaches in forecasting wind speed. We have focused on the statistical approach in our study to forecast the wind power density. Non-linear statistical models are applied to the dataset to evaluate and compare the respective performances.

NONLINEAR MODELS
Recently, ANN modelshave been extensively applied in the real-time series problem to capture nonlinearities prevailing in the dataset. ANN can handle such non-linear temporal correlation and can approximate a large class of functions with a high degree of accuracy. How-ever, thoughacting as a good approximator, ANN has the problem of overfitting and may provide misleading results if not properly monitored.A single hidden layer feedforward network can be mathematically modeled as : where α k (k=0,1,2,3….,m) and β lk (k=0,1,2,3….,m; l=1,2, 3….,n) are connection weights, n is the number of input nodes,m is the number of hidden nodes,ε t is the random error,y t is the output and input are given by y t-1 ,y t-2 ,…,y t-n .

Convolutional Neural Networks
The central idea in a convolutional neural network is a mathematical operation called convolution to detect specific features such as image pixels inpattern recognition problems. A kernel matrix is slid through the input image matrix to create feature maps for the next layer. If an image is denoted by , kernel by , i and j are the relative position, indexes of resultant row and column are represented by and , then: (2) where f denotes the object, h denotes the kernel, i and j denote the relative positions and q and r represents the indexes of resultant row and column.After the convolution operation, certain activation functions are overlapped to introduce non-linear transformation fol-lowed by the max-pooling layers. Max-pooling layers downsample the feature map output to make the representation approximately invariant to small transitions. The nodes after the pooling layers are flattened into a fully connected layer to make predictions with subsequent layers.

Dense Neural Networks
Dense neural networks (DNN) have fully connected linear layers where the result of each node being passed through a non-linear activation function. In other words, each node of a layer receives an input from the previous layers in a densely connected way. The DNN thus adds non-linearity in the operation and can approximate complex mathematical functions [30].

Recurrent Neural Networks
The RNN is a widely used ANN in solving problems having temporal correlations and those exhibiting temporal dynamic behavior [31]. They connect the hidden layer with the former ones in a circular way. These recurrent units have the ability to save the historical information from the sequence, thusmaking them fit for problems whose output depends on the previous values. Unlike conventional ANN, the overfitting issues can be prevented by dropping out or randomly ignoring a certain proportion of neurons from the neural network where corresponding weights are not updated during the forward or backward pass of the training phase [32,33].

Long Short Term Memory Network
This particular approach was developed to overcome the drawbacks of simple RNN like vanishing gradient and exploding gradient problems [34,35]. Here, theset of recurrently connected memory blocks are defined interms of memory cells and other resembling units such as multiplicative units like forget, input, and out gates with different functionality [34,36]. Bidirectional LSTM emphasizes learning the most out of the input sequence by unrolling the network in both forward and backward directions. It is commonly used for sequence prediction and sequence generation. The CNN-LSTM uses a CNN layer where the input sequence helps in feature extraction and the LSTM layer accounts for interpretation and sequence prediction based on time [37].

STUDY AREA
The state of Tamil Nadu (TN) is situated in the southernmost region of the Indian peninsula. In this study, we have considered five stations, namely, Chennai, Coimbatore, Madurai, Salem, and Tirunelveli. The geographical location of these stations in the state is provided in Figure 1. Table 1 provides the geographical details of the latitude, longitude, and altitude of the chosen stations. The table also provides the mean wind speed data collected from MERRA -2 reanalysis database (NASA) over a long-term duration at each site location.Hourly mean wind speed data recorded at a height of 50m above ground level from Jan 1980 to May 2018 was considered in this study. It is to be noted that the hourly wind speed data has been averaged to obtain the daily wind speed data for the purpose of calculating the wind power density values.

WIND POWER DENSITY ESTIMATION
While the term wind speed indicates the average velocity of movement of the wind, the wind power density refers to the capacity of the wind to generate power from a given mass of air for a given region [38]. Wind power density estimation is essential to assess the reliable wind resource potential.The wind power can be computed from the measured wind speed values usingthe probability distribution function. The following expression can be used to calculate the wind power density based on the Weibull probability density function [39]: where ρ is the density of air [ML -3 ] andv is the wind speed [LT -1 ]. The expression forf w (v)is: where v is the wind speed [LT -1 ], k is theshape factor [-]and cis the scale factor [LT -1 ]. The values of k and c are further determined using the standard deviation method and power density method.

Standard deviation method
The values of k and c are estimated using the following equations.
where v and σ are the mean wind speed and standard deviation of the wind speed. The Γ(x) is the gamma function which is commonly used in various calculus models including in probability distribution integrals as defined by the following equation.

Power density method
To estimate the shape and scale factors via the power density method, the energy pattern factor(E pf ) has to be calculated. The E pf is a parameter defined by Akdag and Dinler [40]: where 3 v is the mean of the cube of wind speed and 3 v is the cube of mean speed. The scale factor is computed using Eqn. (4) and the shape factor is computed as per the equation given below [39]: The wind power density values thus calculated based on the above methods for the chosen five cities has been presented in Table 1.

EXPERIMENTAL SETUP
The experiments were performed on Google Colaboratory which included the following specifica-tions: • Intel Xeon @2.20GHz with two cores.
• Nvidia Tesla P100 GPU • 13 GB RAM For training and inferencing of the proposed approach,the popular software library, Tensorflow is used along with the Keras framework. For preprocessing and normalization of the dataset, functions from the Scikit learn are used. All the codes are written in Python3.

PROPOSED MODELLING APPROACH
In this study, five different models are evaluated involving four kinds of ANNs: Convolutional Neural Network, Dense Neural Network, Recurrent Neural Network, Bidirectional LSTM network, and the other model is linear regression (ARIMA). We have compared the performance of the traditional statistical model against the neural networks and linear regression for the prediction of wind power density for the selected stations. The schematic diagram of our proposed methodology is present in Figure 2 with systematic steps required for the analysis. The modeling framework consists of loading the dataset, pre-processing, splitting the dataset into training and test, windowed dataset, building model, train model, forecast, and error calculation. The structure of our proposed methodology may be described as: • Loading the dataset: We have considered five stations, namely, Chennai, Coimbatore, Madurai, Salem, and Tirunelveli from the time period of January 1980 to May 2018 (more than 39 years) with hourly data.
• Data cleaning and Pre-processing: Data cleaning is performed to remove noise from the input dataset after which it is converted from hourly data to daily data.
• Splitting the dataset: The given datasets were split into training data (from January 1980 to December 2009, i.e. about 78.097%) and test data (from January 2010 to May 2018, i.e. 21.903%).
• Preparation of windowed dataset: By windowing, the sequence of time series data was restructured into a supervised learning problem. In this context, we have used 30 previous time step (window size) entries as an input vector (X) and the next entry (i.e. 31 st time step) as the output (Y). Thus, the windowed dataset is defined using the parameters window size as 30 and batch size as 32.

Feeding inputs to the models
A sequential modeling approach is applied to the given data in which the hyper-parameters required to train the model were identified as stochastic gradient descent as an optimizer with a momentum value of 0.9. The CNN-LSTM Conv1D has used32 filters with kernel size as 3 and number of strides as 1. The details of the remaining parameters are provided in Table 2.
The next stage consists of training the model for a fixed number of epochs until the loss saturates to a minimum value. Here we have used 100 epochs for Bidirectional LSTM and simple regression, whereas in simple RNN, Dense Neural network, and CNN-LSTM, we have used 200 epochs to train the model.
During the final stage of data processing, the daily expected results of wind power density were forecasted for th period January 2010 to May 2018 using the trained model. Further, the accuracy of model predictions were compared using the available data with the predicted data using statistical error estimates.

RESULTS AND DISCUSSION
The different network architecture and structures are used in this research to compare the forecasting accuracy based on the calculated error. Statistically, there are various expressions for representing the error values depending on the condition of the data type and required data analysis. The Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Index of Agreement (IA), Root Mean Square Error (RMSE), and Symmetric Mean Absolute Percentage Error (SMAPE) were used in this studyto evaluate the prediction accuracy of wind power density. Mathematically, they are defined as:   The performance evaluation measures for different models for the selected stations are shown in tables (Tables 3-7). It is evolved to identify two significant metrics namely, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for measuring the deviation of the predicted value from the actual value. This highlights the overall accuracy in the prediction of wind power density based on the acquired training experience imparted by the selected algorithms. The highest error in prediction was, however, observed with MAPE for all the stations due to the fact that MAPE values can reach extremes if the unsigned actual values are small. Further, an overall comparison of the performance evaluation measures for all models applied for all stations is comprehended in Table 8. From the results, we can clearly observe the robustness of the 1->Conv|2->LSTM network model in all the time-series data considered. It is inferred that restructuring the time-series problem to a supervised learning problem using a sliding window concept has facilitated the application of many supervised neural network algorithms and comparisons between them. The 1->Conv|2->LSTM outperformed all other models on a 6/7 ratio for themetrics considered. The specific advantages of this model are attributed to the robustness in addressing multiple issues simultaneously due to the hybrid configuration.
Next to 1->Conv|2->LSTM, Bidirectional LSTM performed satisfactorily in the prediction of the timeseries data forthe selectedlocations. We hypothesize about the robustness of the 1->Conv|2->LSTM that convolutional layers cause a denoising effect in the time series and the LSTM layer, being one of the superior variants of RNN, provides better sequence-dependent handling and thus achieves promising results. This also supports the notion that the performance of hybrid models are superior to the basic NN models, thus signifying their flexibility and adaptability in capturing different forms of relationships in time series data.
From the prediction plots (Figures 3-7), it is evident that the applied NN variants are sophistically capturing the non-linearities present in the given time series data. There is an overall similarity in capturing the generic trend of variation in average wind power density values duringthe selected testing period in all five stations. It is also observed that windowing of the dataset has reduced the complexity of the model and caused the model to better capture the non-linearities. On a rank-wise comparison, the two poor-performing models are Simple RNN and Simple LR as shown in Table 8. This may be because Simple RNNs and their variants have loops in their recurrent layers to maintain memory over time. Hence it is to be understood that Simple RNNs are less powerful in solving problems that require learning of long-term temporal dependencies as opposed to the LSTMs which have special units for memory in addition to standard units therebypreserving long-term information. On a similar comparison, the suboptimal performance of DNN in the time series dataset come from the fact that they cannot detect repetition in time series and may produce different results on the same input.
Another important aspect causing the deviation in the accuracy of predictions in various modelling approaches is the difference in selectedwindow size. It is concomitantly established from the results that the window size plays an important role in model performance. We have observed that using a window size of 30 gives a superior performance instead ofusing 10 or 20. This can be explained by the monthly periodicity of dataset distribution. It is also important to note that the commonly used linear models such as ARIMA are incapable of capturing non-linearity in long-term time series. Similarly, the results obtained fromARIMA and AR models were also not significant and promising towards efficient prediction.
The empirical results obtained for the selected stations namely, Chennai, Coimbatore, Madurai, Salem, and Tirunelveli in the state of Tamil Nadu emphasizes the poor performance of ARIMA and the five models that have been presented in this study. A comparison of the performance of those five models undoubtedly establishes the need to have a comprehensive approach in attempting any long-time data for predictive simulation studies. The performancesof the Simple RNN and LR models arethe least significant among the attempted models. The next in the ranking sequence aredense network and CNN-LSTM. Hence it can be inferred from the present study that the LSTM performance is better for the prediction of wind power density among the five different ANN-based predictive models.

CONCLUSION
In this study, a prediction model framework is proposed using five different neural network models to present and compare different forms of relationships in estimating wind power density based on 39 years of data from five different stations in Tamil Nadu, India. Among the five non-linear models used for the forecas-ting, it was found that the 1->Conv1D|2->|LSTM->1 dense model performed better than other models. The performance of simple RNN and linear regression models is poor compared to other models considered in this study. The accuracy of the prediction was confirmed with the ranking of various statistical error measures employed to express the estimated error values as an average of the individual stations. The pre-sent model is capable of forecasting wind speed and power density for other offshore locations as well based on the availability of climatological and meteorological datasets. Similar studies may be performed to forecast the wind power density of other stations. In future studies, other models like wavelet threshold denoising (WTD), adaptive neurofuzzy inference system (ANFIS), and WTD-RNN-ANFIS model can be used for probabilistic wind power forecasting. Such studies will ensure the efficiency and reliability of different models for wind power forecasting.