Ensemble of Radial Basis Neural Networks With K-means Clustering for Heating Energy Consumption Prediction

The study of the building energy demand has become a topic of great importance, because of the significant increase of interest in energy sustainability, especially after the emanation of the EPB European Directive. In Europe, buildings account for 40% of total energy use and 36% of total CO2 emission [1]. According to [2] 66% of the total energy consumption of residential buildings in Norway occurs in the space heating sector. Therefore, the estimation or prediction of building energy consumption has played a very important role in building energy management, since it can help to indicate above-normal energy use and/or diagnose the possible causes, if there has been enough historical data gathered. Scientists and engineers are lately moving from calculating energy consumption toward analyzing the real energy use of buildings. One of the reasons is that, due to the complexity of the building energy systems, non-calibrated models cannot predict well building energy consumption, so there is a need for real time image of energy use (using measured and analyzed data). The classic approach to estimate the building energy use is based on the application of a model with known system structure and properties as well as forcing variables (forward approach). Using different software tools, such as EnergyPlus, TRNSYS, BLAST, ESP-r, HAP, APACHE requires detailed knowledge of the numerous building parameters (constructions, systems) and behavior, which are usually not available. In recent years, considerable attention has been given to a different approach for building energy analysis, which is based on the so called "inverse" or data-driven models [3]. In a data-driven approach, it is required that the input and output variables are known and measured, and the development of the "inverse" model consists in determination of a mathematical description of the relationship between the independent variables and the dependent one. The data-driven approach is useful when the building (or a system) is already built, and actual consumption (or performance) data are measured and available. For this approach, different statistical methods can be used. Artificial neural networks (ANN) are the most used artificial intelligence models for different types of prediction. The main advantages of an ANN model are its self-learning capability and the fact that it can approximate a nonlinear relationship between the input variables and the output of a complicated system. Feedforward neural networks are most widely used in energy consumption prediction. Ekici et al. in [4] proposed a backpropagation three-layered ANN for the prediction of the heating energy requirements of different building samples. Dombayci [5] used hourly heating energy consumption for a model house calculated by degree-hour method for training and testing the ANN model. In [6] actual recorded input and output data that influence Greek long-term energy consumption were used in the training, validation and testing process. In [7] Li et al. proposed the hybrid genetic algorithm-adaptive network-based fuzzy inference system (ANFIS) which combined the fuzzy if-then rules into the neural networklike structure for the prediction of energy consumption in the library building. An excellent review of the different neural network models used for building energy use prediction was done by Kumar [8]. The ensemble of neural networks is a very successful technique where the outputs of a set of separately trained neural networks are combined to form one unified prediction [9]. Since an ensemble is often more accurate than its members, such a paradigm has become a hot topic in recent years and has already been successfully applied to time series prediction [10], weather forecasting [11], load prediction in a power system [12]. The main idea of this paper is to propose ensemble of radial basis neural networks for prediction of heating energy use.


INTRODUCTION
The study of the building energy demand has become a topic of great importance, because of the significant increase of interest in energy sustainability, especially after the emanation of the EPB European Directive.In Europe, buildings account for 40% of total energy use and 36% of total CO 2 emission [1].According to [2] 66% of the total energy consumption of residential buildings in Norway occurs in the space heating sector.Therefore, the estimation or prediction of building energy consumption has played a very important role in building energy management, since it can help to indicate above-normal energy use and/or diagnose the possible causes, if there has been enough historical data gathered.Scientists and engineers are lately moving from calculating energy consumption toward analyzing the real energy use of buildings.One of the reasons is that, due to the complexity of the building energy systems, non-calibrated models cannot predict well building energy consumption, so there is a need for real time image of energy use (using measured and analyzed data).The classic approach to estimate the building energy use is based on the application of a model with known system structure and properties as well as forcing variables (forward approach).Using different software tools, such as EnergyPlus, TRNSYS, BLAST, ESP-r, HAP, APACHE requires detailed knowledge of the numerous building parameters (constructions, systems) and behavior, which are usually not available.
In recent years, considerable attention has been given to a different approach for building energy analysis, which is based on the so called "inverse" or data-driven models [3].In a data-driven approach, it is required that the input and output variables are known and measured, and the development of the "inverse" model consists in determination of a mathematical description of the relationship between the independent variables and the dependent one.The data-driven approach is useful when the building (or a system) is already built, and actual consumption (or performance) data are measured and available.For this approach, different statistical methods can be used.
Artificial neural networks (ANN) are the most used artificial intelligence models for different types of prediction.The main advantages of an ANN model are its self-learning capability and the fact that it can approximate a nonlinear relationship between the input variables and the output of a complicated system.Feedforward neural networks are most widely used in energy consumption prediction.Ekici et al. in [4] proposed a backpropagation three-layered ANN for the prediction of the heating energy requirements of different building samples.Dombayci [5] used hourly heating energy consumption for a model house calculated by degree-hour method for training and testing the ANN model.In [6] actual recorded input and output data that influence Greek long-term energy consumption were used in the training, validation and testing process.In [7] Li et al. proposed the hybrid genetic algorithm-adaptive network-based fuzzy inference system (ANFIS) which combined the fuzzy if-then rules into the neural networklike structure for the prediction of energy consumption in the library building.An excellent review of the different neural network models used for building energy use prediction was done by Kumar [8].The ensemble of neural networks is a very successful technique where the outputs of a set of separately trained neural networks are combined to form one unified prediction [9].Since an ensemble is often more accurate than its members, such a paradigm has become a hot topic in recent years and has already been successfully applied to time series prediction [10], weather forecasting [11], load prediction in a power system [12].The main idea of this paper is to propose ensemble of radial basis neural networks for prediction of heating energy use.

ARTIFICIAL NEURAL NETWORK ENSEMBLES
Many engineering problems, especially in energy use prediction, appeared to be too complex for a single neural network.Researchers have shown that simply combining the output of many neural networks can generate more accurate predictions and significantly improve generalization ability than that of any of the individual networks [13].Theoretical and empirical work showed that a good ensemble is one where the individual networks have both accuracy and diversity, namely the individual networks make their errors on different parts of the input space [14].An important problem is, then, how to select the aggregate members in order to have an optimal compromise between these two conflicting conditions [15].The accuracy can be described by the mean square error (or some other prediction indicator) and achieved by proper training algorithms of neural networks.Diverse individual predictors (members) can be obtained in several ways.The most widely used approaches [16], [17] can be divided in three groups.
The first group of methods refers to training individuals on different adequately-chosen subsets of the dataset.It includes elaborations of several important "resampling" techniques: cross-validation, bagging and boosting.These methods rely on resampling algorithms to obtain different training sets for the component predictors [18].In cross-validation the dataset is split into roughly equal-sized parts, and then each network is trained on the different parts independently.When the data set is small and noisy, this technique can help to reduce the correlation between the networks better than in the case where each network is trained on the full dataset.When there is a need for a larger set of independent networks, splitting the training data into non-overlapping parts may cause each data part to be too small to train each network if no more data are available.In this case, data reuse methods, such as bootstrap can be useful.Bagging, proposed by Breiman [19], is an acronym for "bootstrap aggregation".It consists of generating different datasets drawn at random with replace from the original training set and then training the different networks in ensemble with these different databases.Some examples of original training set may be repeated in resulting training set while others may be left out.The boosting algorithm, proposed by Schapire [20] trains a set of learning machines sequentially on data that has been filtered by the previously trained learning machines.It can help to reduce the covariance among the different NNs in an ensemble.For the bagging algorithm, training set is generated by randomly drawing, with replacement.There is no relationship between different training sets; therefore, the networks can be trained at the same time.In the boosting algorithm, training set is chosen based on the performance of the earlier network.So the networks must be trained consecutive.
The second group of methods for achieving diversity uses variation of topologies, by varying number of input and/or hidden nodes, initial weight sets, training algorithms, or even networks with different types.
The third group is named a selective approach group where the diverse components are selected from a number of accurately trained networks.Opitz et al. [21] proposed a generic algorithm to search for a highly diverse set of accurate networks.Other used algorithms for selecting ensemble components are: pruning algorithm to eliminate redundant classifiers [22] selective algorithm based on bias/variance decomposition [23], genetic algorithm proposed in [24] and PSO based approach proposed by [25].
In [26] the diversity was achieved using different network architectures for ensemble members: feedforward neural network (FFNN), radial basis function network (RBFN) and adaptive neuro-fuzzy inference (ANFIS).All networks were trained and tested on the same dataset.In order to create the ensemble, members (prediction results of individual networks) were then aggregated using simple, weighted and median based averaging.The main idea of this paper is to investigate possible improvement of prediction accuracy by creating diversity using "resampling" technique to obtain a different training set for each ensemble member.K-means algorithm will be used to create different training subsets.

Radial basis function networks (RBFN)
A radial basis function network (RBFN), a type of feedforward neural network, consists of three layers including an input layer, a single hidden layer with a number of neurons, and an output layer.The input nodes are directly connected to the hidden layer neurons.The hidden layer transforms the data from the input space to the hidden space using a nonlinear function.The nonlinear function of hidden neurons is symmetric in the input space, and the output of each hidden neuron depends only on the radial distance between the input vector and the center of the hidden neuron.For a RBFN with an n-dimensional input x∈R n the output of the j-th hidden neuron is given by: ( ) ( ) where c j is the center (vector) of the j-th hidden neuron, m is the number of neurons in the hidden layer and ϕ(.) is the radial basis function.Therefore, RBFN uses the radially symmetrical function as an activation function in the hidden layer, and the Gaussian function, the most commonly used activation function, is adopted in this study.The neurons of the output layer have a linear transfer function.The k-th output of the network is obtained by the weighted summation of the outputs of all hidden neurons connected to that output neuron: where w kj is the connecting weight between the j-th hidden neuron and the k-th output unit, w k0 is the bias and m is the number of the hidden layer neurons.The standard technique used to train an RBF network is the hybrid approach, which is a two-stage learning strategy.At first stage, some unsupervised training procedures are used to adjust the parameters of the radial basis functions (centers and widths).For example, centers can be chosen randomly or using some clustering algorithms, or self-organizing map method.In the second stage, the weights of the output layer are adapted applying the supervised training algorithms, such as gradient method, least-squares method, orthogonal least squares or SVD.Majority of the learning algorithms mentioned above belong to batch learning where the network parameters are updated after presentation of all training samples to the network.In other type of learning, sequential learning, the network parameters are adjusted after the presentation of each training sample.The characteristics of both types of learning, with their advantages and disadvantages can be found, for example, in [27] and [28].

K-means clustering
K-means clustering, proposed by MacQueen [29] is a method commonly used to automatically partition a data set into m groups.Even though k-means was first proposed over 50 years ago, it is still one of the most widely used algorithms for clustering.Ease of implementation, simplicity, efficiency, and empirical success are the main reasons for its popularity.This technique is based on distance matrix, using Euclidean distance as a criterion.It starts with m initial cluster centers and for all data, Euclidean distance from each cluster center is calculated, after which the data points are assigned to the closest cluster center.This method is being repeated until the squared error between the empirical mean of a cluster and the points in the cluster is minimized.The goal is to divide the entire dataset by individual m clusters, where the number of elements in each cluster is n i , and the center of cluster is c i .So clustering can be achieved by finding c i which makes .

CASE STUDY
University campuses are specific groups of diverse buildings, with significant energy consumption [30].They consist of many different buildings, representing small-scale town for itself.Therefore, they provide an excellent testbed to characterize and understand energy consumption of group of "mixed use" buildings.Norwegian University of Science and Technology (NTNU) campus Gløshaugen consists of 35 buildings, with total area of approximately 300,000 m 2 (Figure 1).

Data pre-processing
All consumption and weather data were gathered from web-based Energy Remote Monitoring System [31] and form the local meteorological station Skjetlein, [32], respectively.Duration of the heating season in Trondheim area is usually in range 251 to 280 days [33].The heating season is defined as the period from the day the mean daily temperature falls below 11°C during the autumn and until the day it rises above 9°C during the spring [34].Considering that the outside temperature has the biggest influence on heating energy consumption, mean daily outside temperatures for years 2006 until 2014 were investigated.More details about data pre-processing can be found in [26].Database is divided in periods: • Cold period -from January 1 st until March 31 st and from November 1 st until December 31 st • Mild period -From April 1 st until June 15 th and from September 16 th until October 31 st • Warm period (outside of heating season) -June 15 th  until September 15 th is excluded from the analysis It implicates that better prediction results can be obtained using separate network models for each period compared to using one network for all year.In this paper, only the working days in the cold period (with biggest heating energy consumption) will be analyzed.relative humidity [%], day of the week, month of the year.Results of partial autocorrelation, which measures how a series is correlated with itself at different lags, indicate that the heating consumption of the previous day has the biggest influence on the heating consumption of the observed day (Figure 2).Therefore, the heating consumption of the previous day is selected as additional input variable.For training the networks, data for the working days in the cold period (from January 1 st until March 31 st and from November 1 st until December 31 st ) for years 2009, 2010 and 2011 were used (318 samples in total), and for testing 2012 (100 samples).Data with obvious errors and heat meter malfunctions were removed from the dataset.Because the numerical range of the input and output variables may be quite different for some applications, it is often useful to normalize the input and output variables, so in the following experiments, all input variables are normalized to values between 0 and 1.The prediction accuracy of networks is measured by the coefficient of determination (R 2 ), root mean square error (RMSE) and mean absolute percentage error (MAPE).

K-means for resampling training dataset
In [35] authors proposed k-means algorithm to classify samples in order to obtain training datasets with largest diversity and more information about the whole data space.Following that idea, possible application of kmeans for creating diversity among ensemble members by resampling training dataset is investigated.All available training dataset (318 samples) is shown in Figure 3.It can be seen that there is a wide range of daily heating consumption, from 50.000 kWh to over 300.000 kWh.These differences implicate possible benefit from clustering training data.Let n is the number of input variables, and p is the total number of training samples.The procedure for creating training subsets, shown in Figure 3, is as follows: 1. Entire training dataset with p samples in total is divided into q clusters, where p j is the number of samples in j-th cluster, (j=1,...,q).Obviously p 1 +p 2 +...+p q =p. 2. Samples from entire training dataset are then randomly chosen and added to each cluster until the number of samples in each cluster is equal to total number of samples p. Created subsets are used for training individual ensemble members, in this case, radial basis neural networks.This improved method of generating training sample subsets effectively prevents that each individual neural network is trained on too small data subset.

RBFN models
A customized RBFN function available in MATLAB based on orthogonal least squares algorithm, which iteratively creates a radial basis network one neuron at a time, is used to develop the networks.The number of neurons in the hidden layer is increased automatically until the error goal is achieved, or the maximum number of neurons in hidden layer has been exceeded.The radius value (known as spread) of the radial basis function was varied for the best performance of the RBF network.

NEURAL NETWORK ENSEMBLE
Possible improvement of the ensemble prediction accuracy by using k-means "resampling" for creating diversity among ensemble members is examined.The application of an ensemble technique is divided into two steps.First, training data is divided in clusters (training data subsets).Number of clusters is varied from 2 to 5. In order to avoid that training subset is too small, some samples from original dataset are randomly chosen and added to each cluster, so the total number of samples in each cluster is 318.Obtained datasets are used to train individual RBFNs.The second step is the adequate combination of outputs of the ensemble members to produce the most appropriate output.In order to improve ensemble efficiency, we need to ensure both accuracy of networks and diversity between individuals.Accuracy is achieved by appropriate choice of network parameters.
The diversity is achieved by using different training dataset for each member.In previous work [26], the diversity was achieved by using different network architectures (FFNN, RBFN and ANFIS), while the main idea in this paper is to apply k-means for resampling dataset in order to create diversity between members.Individual networks are tested on the data for the year 2012 (100 samples in total).The outputs of individual networks are aggregated using simple average (Ensemble SAV), weighted average (Ensemble WAV) and median based average (Ensemble MAV).Linear combination of the outputs of ensemble members is one of the most popular approaches for combining selected network outputs.
The described procedure of resampling training dataset and creating ensemble is shown in Figure 4.
The accuracy of the proposed ensembles in terms of coefficient of determination (R 2 ) while varying number of the ensemble members can be seen in Figure 5. RMSE and MAPE, for different cluster number for testing period are shown in Figures 6 and 7, respectively.Application of k-means clustering technique and creating ensemble resulted in an improvement in accuracy over the prediction of the best trained individual RBFN (Table 1).
In this paper we have demonstrated that neural network ensembles offer enhanced performance over best trained single network.The best result for R 2 is 0.9839, achieved with Ensemble WAV with 4 clusters.The prediction results of this ensemble for testing period comparing to measured values are shown in Figure 8.All observed prediction indices (R 2 , RMSE and MAPE) have better results than the single network.For various cluster numbers the ensemble has proven its preeminence comparing to the best trained single RBFN.

Figure 1 .
Figure 1. District heating net in Gløshaugen 4. ANN MODEL DEVELOPMENT The most important task in building an ANN prediction model is the selection of input variables.Many different studies dealing with impact of various variables on energy consumption can be found in literature.All input variables for the neural network model, that are considered in this study, are: mean daily outside temperature [°C], mean daily wind speed [m/s], total daily solar radiation [Wh/m 2 ], minimum daily temperature [°C], maximum daily temperature [°C],relative humidity [%], day of the week, month of the year.Results of partial autocorrelation, which measures how a series is correlated with itself at different lags, indicate that the heating consumption of the previous day has the biggest influence on the heating consumption of the observed day (Figure2).Therefore, the heating consumption of the previous day is selected as additional input variable.

Figure 5 .Figure 6 .Figure 7 .
Figure 5. R2 for different cluster number for ensembles and single RBFN for testing data