Towards the Determination of Driving Factors of Varying LST-LCZ Relationships – a Case Study over 25 Cities

The current study aimed to prove the existence of a significant relation between land surface temperature (LST) and local climate zones (LCZs) and its possibility to be generalized to all cities around the world with different climatic zones and characteristics. The further step in this regard was to find the effective climatic and geographical variables affecting this potential relation. For that, 25 cities all around the world with various climatic conditions were selected based on the availability of appropriate satellite images and level zero data on the World Urban Database and Portal Tool (WUDAPT). After acquiring both LST and LCZ maps the comparison between them was made with the Wilcoxon rank sum test indicating the existence of any meaningful pattern. Then, 8 climatic and geographical variables and all possible combinations thereof were assessed to determine the effective drivers on the LST-LCZ relationship. The results showed that the combination of the latitude, mean and maximum annual temperature affected this connection more than any other considered variables.


Introduction
Proceeding rapidly, urbanization has an intense impact on the bio-physical surface conditions and energy and material fluxes in its purpose areas . Urbanization typically replaces the existing natural surface cover with impervious materials and buildings, and increases the activities transferring waste disposal into its environment. Known as one of the main drivers of global environmental change, cities are also remarkably exposed to the consequences of this change such as rising sea levels and increased air temperatures (Oke, 2004). Furthermore, cities generate distinct climatic conditions, which can cause discomfort, heat stress, and exposure to disease and pollution on humans. In particular, cities are warmer than their surroundings. This fact is referred to as the urban heat island (UHI); a phenomenon where urban regions experience higher temperature than their rural surroundings (Imhoff et al., 2010). Traditional ways of defining the UHI magnitude mostly involved the urban-rural classification. As there were ambiguities in implementation of such classification, researchers tried new approach to make heat island observations more comprehensive through introducing methods of intra-urban climate classifications. Such sort of classifications were defined to highlight urban impact on local climate (Houet & Pigeon, 2011): Urban Terrain Zones (Ellefsen, 1990), Urban Climate Zones (UCZs) (Oke, 2004), and Local Climate Zones (LCZs) (Stewart & Oke, 2009a, b, 2010. Being more detailed and improved, LCZ is the main focus of many recent urban studies. The concept of Local Climate Zone, which was introduced by Steward and Oke (2009), has been evolved to quantify the relation between urban configuration and UHI (Cai et al., 2016). LCZs are specifically defined as "regions of uniform surface cover, structure, material, and human activity that span hundreds of meters to several kilometres in horizontal scale" . Furthermore, each LCZ is supposed to represent homogeneous air temperature (Cai et al., 2017). The first study evaluating the conceptual division of LCZs by temperature observations and simulations from the surface-atmosphere models, was conducted by Stewart et al. (2014). They proved that thermal contrasts exist among all LCZ classes over all their case study areas: Nagano (Japan), Vancouver (Canada), and Uppsala (Sweden). Following that research, many scientists have been trying to assess this connection in different regions. Lehnert et al. (2015) evaluated this concept in Olomouc (Czech Republic) and tried to provide a classification of existing stations within local climate zones. Alexander and Mills (2014) clarified the air temperature difference between LCZs in Dublin (Ireland) by applying data from 6 fixed stations and additional mobile measurements. Performing the same mobile air temperature measurement approach, Leconte et al. (2014) showed that air temperature varied between LCZs in Nancy (France). Fenner et al. (2014) proved the connection between LCZs and air temperature in Berlin (Germany) on the basis of up to 19 fixed meteorological stations and additional around 400 citizen weather stations. Lee and Oh (2016) stated that UCZ and air temperature maps displayed an almost identical pattern. Gal et al. (2016) and Skarbit et al. (2015) investigated the same case in Szeged (Hungary) and found a great difference between the LCZs. Geletic et al. (2016) found a certain degree of agreement between air temperature and LCZs in Prague and Brno (Czech Republic). Cai et al. (2017) in Yangtze River Delta (China) showed, as well, that the air temperature has considerable connection with LCZs. Beck et al. (2018) investigated the relations between air temperature and LCZs in Augsburg (Germany) under various synoptic conditions by using a comprehensive logger network. Their results confirmed conformity between air temperature and LCZs.
Despite some remaining issues such as the exact pattern of the fluctuation of air temperature among the zones, all of the previous studies proved the existence of considerable relation between standard or special-purpose near-surface air temperature and LCZs (Dobrovolny & Krahula, 2015). Given that, the mentioned air temperature maps can be used as one of the main tools to evaluate the performance of local climate zoning along with the confusion matrix describing the accuracy of such classification. It is worth mentioning that the confusion matrix final measures reflect the robustness of zones and their consistency but they do not indicate whether they are semantically correct . With that in mind, for many cities deprived of adequate observational temperature records, an alternative variable which is in close relation with air temperature should be considered.
Land surface temperature (LST) derived from various airborne or satellite remote sensing systems may be a suitable choice, as their spatial coverage is complete (Geletic et al., 2016). Although these two sort of temperatures have different physical concepts and responses to atmospheric conditions (Mutiibwa et al., 2015), several studies (Xu et al., 2014;Florio et al., 2004;Oyler et al., 2015) in other contexts have demonstrated not only a strong connection between LST and near-surface air temperature, but also the possibility of benefiting from satellite-based LST as an substitution for air temperature in regions with a sparse meteorological record.
There have been some remarkable studies focusing on quantifying the relations between LST and either LCZs (Cai et al., 2017;Geletic et al., 2016;Gemes et al., 2016; or UCZs (Houet & Pigeon, 2011). One of the most comprehensive ones in this context has recently been conducted by . In that study, the connection between LST and LCZs derived from The World Urban Database and Portal Tool (WUDAPT) over 50 cities around the world was evaluated, and a considerable relation among them was concluded.
The present study, as a complementary research, took further steps and besides proving the relations between LST and LCZs in the considered cities, explored if these relations exhibit an evident clustering and in how far this could be related to any potential effective climatic and geographical driving factors. This way our study aimed towards not only providing a more in-depth insight into the connection between LST and LCZs also its dependence on climatic and geographical drivers.
For that, after defining the connections for each city, a clustering of them was performed to spot those groups featuring similar LST-LCZ relations and to subsequently identify the effective driving factors of this grouping. In this study 8 climatic and geographical factors were focused: 1) Latitude, 2) Longitude, 3) Altitude, 4) Mean annual temperature, 5) Minimum annual temperature, 6) Maximum annual temperature, 7) Total annual precipitation and 8) Population density.

Study area
The study considered 25 cities well-scattered around the world were considered. There are two main criteria in selecting these cities: 1) the availability of the level zero product of the WUDAPT project and its quality, 2) the existence of Landsat8 images of the United States Geological Survey (USGS) for specific conditions e.g. concerning cloudiness. With that in mind, 8 cities were considered in the Americas, 8 cities in Europe, 3 in Africa and 6 in Asia and Oceania (see Figure 1). 11 of them were coastal cities and 14 were inland ones. As can be seen, this set of cities represents varying climatic conditions ranging from tropical to boreal.

Local climate zone classification
Local climate zoning for all case study cities was performed following the standardized "WUDAPT-workflow" (Bechtel & Daneke, 2012;Bechtel et al., 2015). This approach consists of two main steps (Beck et al., 2018): 1) generating so-called "training areas" (TA) representing typical surface structure morphology for each LCZ, 2) applying the properties of these TAs to assign each pixel of selected Landsat images to its corresponding LCZ by a random forest algorithm implemented in the SAGA open source GIS software .
One of the main advantages of WUDAPT level zero products is the included threefold quality control: cross-validation, manual review and cross-comparison with other data . Therefore, generating the LCZ maps based on WUDAPT level zero products can be a promising way to clas-sify LCZs with acceptable accuracy. WUDAPT applies two main accuracy measures for its products: The overall accuracy (OA) and the weighted accuracy (WA). The OA indicates the percentage of correctly classified pixels while the weighted accuracy (WA) takes the class similarity into account (Bechtel et al., 2017). In the present study for each city, the training areas were downloaded from WUDAPT. Then, four Landsat8 images featuring less than 10% cloudcover during daytime were used to perform the LCZ-estimation. In the end, the built-up and natural LCZ types were determined for 100 m ×100 m raster cells in each case study city (Appendix 1 shows LCZ maps for all cities). As can be seen from the appendix, there are 17 local climate zones from compact high-rise (1) to water (G) according to Stewart and Oke (2012).

Land surface temperature retrieving
Land surface temperature (LST) is defined as the temperature which is felt when the surface is touched with the hands (Rajeshwari & Mani, 2014). In this context, the surface is every exposed plane to the satellite sensors while recording the data on the ground. It could be snow and ice, the soil, the roof of a building, or the canopy of a forest. There are three main satellite data sources to retrieve the LST: Landsat Images, the Moderate Resolution Imaging Spectroradiometer (MODIS) and Aster. Some differences in the case of spatial and temporal resolutions (Brown et al., 2006) can be noticed among their records. In the frame of this study, according to its concept, Landsat 8 images found to be well-suited. Landsat 8 was successfully launched on the 11th of February 2013 into space carrying two main instruments: the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS) (Salih et al., 2018). The OLI instrument collects the data in nine spectral bands in the visible, near-infrared, and the shortwave infrared spectral regions, while the TIRS instrument attains the data in two thermal infrared spectral bands in the LWIR respectively centered at 10.9 μm and 12 μm (Wulder et al., 2016).
Many algorithms have been developed to retrieve the LST from Landsat 8 images, for example, the single-channel algorithm (e.g. Jimenez et al., 2015), the split-window algorithm (e.g. Du et al., 2015) and the temperature and emissivity separation method (e.g. Wang et al., 2015). In this study the automated mapping algorithm introduced by Avdan & Jovanovska (2016) was applied, since other approaches are much more time consuming, and more prone to incorrect estimates (Avdan & Jovanovska, 2016). This method requires the fourth, fifth and tenth bands of a Landsat 8 scene. To acquire the LST maps, 4 Landsat 8 scenes each belonging to a season were downloaded from the United States Geological Survey (USGS) website and applied to calculate the average LST measures for all case study cities (Appendix 1 includes all LST maps).

Wilcoxon rank sum test
In order to find any significant difference between the LST values among the local climate zones in each city, Wilcoxon rank sum tests (Wilcoxon, 1945) including the Holm adjustment of p-values in multiple testing (Holm, 1979) were performed.
This test is a non-parametric statistical test used to compare two dependent samples to assess whether their population means' ranks vary significantly. Under the null hypothesis, similarity between the pairs is expected; therefore, in the present study p-values greater than the chosen α (significance level), indicated that LST means do not significantly differ among LCZs.

Hierarchical clustering
Cluster analysis has been applied to determine any systematic pattern in the LST-LCZ relation over case study cities. Clustering is a process of grouping enti-ties into a number of groups such that objects in the same groups are more identical than to those in other groups (Rokach & Maimon, 2005). In this study, we had two sets of clustering results through a hierarchical cluster analysis. First, all case study cities were clustered based on the characteristic of their LST-LCZ relations (the mean LST of each LCZ) generating one clustering result. Then, once more, they were clustered based on the characteristic of their climatic and geographical factor combinations resulting in 255 clusterings, according to the number of combinations.
The here used agglomerative approach (Murtagh & Contreras, 2011), in every clustering, commenced with each city as one specific cluster. Then the two most similar clusters were merged into a new one. Finally, the algorithm terminated when there was only one single cluster left.
Some of the considered cities were lacking various LCZs, leading to the exclusion of those LCZs from the clustering process even for those cities featuring these LCZs. Therefore, to avoid the loss of information a suitable approach had to be considered. In order to include all 17 LCZs in the clustering process, filling lacking zones with the mean LST of two main local climate super-types (built-up types and natural types) depending on which type the lacking LCZs belong to, was utilized.

Rand Index
To determine the most effective geographical and climatic factors, the clustering based on the characteristic of the LST-LCZ relations was compared to those performed based on the characteristic of all 255 combinations of selected potential driving factors. The Rand index -RI (Rand, 1971) along with the adjusted Rand index -ARI (Hubert & Arabie, 1985) were applied to quantify the agreement between every two clusterings. It should be mentioned that the ARI is corrected for by chance agreements among clusterings. For both indices, a maximum positive value of 1 indicates a perfect agreement among the two partitions to be compared. Thus, those clusterings of driving factor combinations exhibiting maximum RIs/ARIs with the LST-LCZ clustering represent the most significant influencing factors on LST-LCZ relationships. Figure 2 presents boxplots of LST among local climate zones for all case study cities. Each boxplot shows the distribution of data based on a five-number summary: minimum (the lowest whisker), first quartile (the lowest boundary of the box), median (the line within the box), third quartile (the highest boundary of the box), and maximum (the highest whisker). It can illustrate not only the outliers, but also if the data is symmetrical, how tightly they are grouped, and if and how they are skewed. As can be seen, most of the samples did not exhibit any skewness which could be interpreted as a hint to normal distribution of the LSTs within each LCZ. However, some zones had skewed LST sets, such as water zone (G) in Berlin, Khartoum, Kuala Lumpur, Lisbon, Los Angeles, Montevideo and Tehran pointing to an asymmetry in the LST distribution. The comparatively wide boxes in some cases, such as bare soil or sand zone (F) in Athens, large lowrise zone (8) in Khartoum, water zone (G) in Phoenix and Sfax and open low-rise zone (6) in Washington DC, indicated a distinct variability of the LST in the corresponding zones. On the other hand, the comparatively narrow boxes indicated a higher level of agreement among the LST measures. Overall, the built-up and natural types showed two independent sets of patterns in each city.

LST-LCZ relation
According to Figure 2 the built-up types' LST sets fluctuated in a hotter range than those in natural types. It is the same pattern, according to the concept of LCZs, that should be expected in the term of air temperature as well, however, some natural types, such as rock and paved (E) and soil (F) zones, were violating this pattern in the case of LST. In the built-up types' set the lightweight low-rise (7) and large lowrise (8) showed remarkably hotter LST. The water zone (G), if existing in a city, was the coolest LST among both built-up and natural types.

The significance of LST differences among LCZ types
The Wilcoxon rank sum test was applied to determine the statistical significance of differences in LSTs among LCZs. Resulting p-values of this test are shown in Figure 3 for each case study city. For cells in the triangle plots filled with circle-cross no significant differences among LSTs estimated for the representative LCZs could be deduced.
Although some pairwise comparisons showed pvalues exceeding the significance level (here 0.05), in most cases (96% of all pairs) the alternative hypothesis was fulfilled and a significant difference was confirmed. In most cases the p-values were lower than 0.001 indicating a highly significant difference between LSTs related to different LCZs. The cities that showed the most insignificant differences were Athens and Sfax. In Athens, for example, the low plants zone (D) and bare soil or sand zone (F) exhibited no significant differences of LSTs to all other zones. On the other hand, Berlin, Khartoum, Moscow and Phoenix featured significant differences for all pairs.

Determination of the most effective climatic and geographical driving factors
After recognizing the existence of meaningful LST-LCZ relations in all case study cities, a further step would be the detection of any potential climatic or geographical variables which affected these relations.
For that, the clustering of all considered cities on the basis of their LST-LCZ relations, would lead us to determine any mutual climatic and geographical variables responsible for the similarity within each cluster. Here, hierarchical clustering was applied to recognize similar cities based on their LST-LCZ relations. Mean LSTs of each local climate zone were used as the criteria for clustering; however, lacking zones in some cities would cause to reduce part of these criteria. Figure  4 shows the results of the hierarchical clustering: the dendrogram, cluster locations on the map, the screeplot and the centroids of each cluster. The appropriate number of the clusters according to the screeplot, dendrogram and the overall number of case study cities was considered as 5. To find the effective factors, 8 climatic and geographical ones were investigated in this study: 1) Latitude, 2) Longitude, 3) Altitude, 4) Mean annual temperature, 5) Minimum temperature, 6) Maximum temperature, 7) Total annual precipitation and 8) Population density. Figure 5 shows the mean value of these factors in each cluster.
There were 255 various combinations (From 1 to 8 element sets) of the mentioned variables. To define which one carried the most effective factors on the LST-LCZ relation, all case study cities were clustered based on each combination. Each resulting clustering was compared to the original clustering of the LST-LCZ relations by the use of the RI and the ARI. Figure 6 illustrates the respective indices of the ten most relevant factor combinations. As can be seen, the one consisting of latitude, mean and maximum temperature yielded the most similar clustering results to the LST-LCZ relation-based clustering. In this case, the RI and ARI were 0.73 and 0.303 respectively. The tenth highest combination was including the latitude, altitude and mean annual temperature with RI and ARI of 0.72 and 0.261 respectively.

Conclusions
The results showed that the built-up and natural types behaved as two independent sets of patterns in each city. The built-up types' LST sets fluctuated in a hotter range than those in natural types. It is worth noting that the hottest zone in each final cluster was large low-rise (8) followed closely by bare rock and paved (E), bare soil and sand (F) and lightweight low-rise (7). The water zone (G), if existing in a city, was the coolest LST among both built-up and natural types in all case study cities.
Relation recognition in this study was performed by the Wilcoxon rank sum test, which in 96% of all LCZ pairs proved a meaningful and significant difference of LSTs. Furthermore, according to the RI and ARI, the most effective combination of climatic and geographical variables consisted of latitude, mean and maximum temperature showing the closest resemblance to the clustering on the basis of LST-LCZ relations.

Discussion
The current study aimed to prove the relation between LST and LCZ and its possibility to be generalized to all cities around the world with different climatic zones and characteristics. Besides, a further step intended to determine the effective climatic and geographical factors as drivers of varying relationships.
For that, 25 cities all around the world with various climatic conditions were selected based on the availability of appropriate satellite data and level zero data from WUDAPT. To retrieve the LST maps, the automated mapping algorithm (Avdan & Jovanovska, 2016) was applied. The satellite data used in this algorithm was downloaded from the USGS website. For each city, four images each belonging to one season were applied as inputs to the algorithm. LCZ maps were generated based on the workflow of the WUDAPT by the use of its level zero datasets. After acquiring both LST and LCZ maps the comparison among them was made with the Wilcoxon rank sum test indicating the existence of any meaningful connection. After investigating the LST-LCZ relation, finding effective climatic and geographical factors was targeted. Here, all combinations of 8 variables (total 255 cases) namely 1) Latitude, 2) Longitude, 3) Altitude, 4) Mean annual temperature, 5) Minimum temperature, 6) Maximum temperature, 7) Total precipitation and 8) Population density, were considered. To achieve the mentioned purpose, the result of the clustering based on the characteristics of the LST-LCZ relation was compared with the outcome of the clus- tering on the basis of the characteristics of each mentioned factor combination. The similarity between every two clusterings (LST-LCZ relation and each factor combination based-clustering) implied the most effective combination of the stated variables. Given that, all pairs of clusterings were compared based on the Rand (Rand, 1971) and adjusted Rand indices (Hubert & Arabie, 1985). The meaningful relation between LST and LCZ was proved in this research as well as in previous studies like Geletic et al. (2016), Gemes et al. (2016), Cai et al. (2017) and , which all concluded a considerable connection between these two variables in their case study regions. Relation recognition in this study was upheld by the Wilcoxon rank sum test results, which in 96% of all LCZ pairs illustrated a meaningful and significant difference of LSTs. Insignificant differences among some LCZs, as well, in the case of air temperature, were reported previously by Stewart et al. (2014). There were overall 4 cities in which all pairs showed significant differences: Berlin, Khartoum, Moscow and Phoenix. Considering the two main built and natural types, 20% of all insignificant pairs were intra-natural and 17% were intra-built types; the rest of them (63% of all insignificant pairs) were inter-types. With analyzing the mean LST of each LST-LCZ relation cluster, it could be noted that the hottest zone in each cluster was large low-rise (8) followed closely by bare rock and paved (E), bare soil and sand (F) and lightweight low-rise (7). The coolest ones were the water (G) and dense tree (A). These results are more likely to be dependent on the albedo of the surface since the LST is in close connection with this factor. The albedo of the paved grounds either streets or the roof of the buildings according to their colour is lower which causes higher LST. Although the wetness of the surface is another significant factor in LST measurements, with regard to the average LST throughout four seasons used in this study, this factor was considerably negated.
In the case of finding the most effective climatic and geographical variables, the RI and ARI were used. According to these indices, the clustering based on the latitude, mean and maximum temperature showed the closest resemblance to the clustering on the basis of LST-LCZ relations. Although the ARI showed a weak measure (0.303), this combination was the highest among all. Therefore, additional variables, besides those used in this research, should be considered in future studies to achieve a more comprehensive insight into drivers of LST-LCZ relationships. Urbanization characteristics and albedo indices of cities according to the essence of LST and LCZs are recommended by the authors to be included in future studies.