Accuracy Assessment of Digital Surface Models Based on a Small Format Action Camera in a North-East Hungarian Sample Area

The use of the small format digital action cameras has been increased in the past few years in various applications, due to their low budget cost, flexibility and reliability. We can mount these small cameras on several devices, like unmanned air vehicles (UAV) and create 3D models with photogrammetric technique. Either creating or receiving these kind of databases, one of the most important questions will always be that how accurate these systems are, what the accuracy that can be achieved is. We gathered the overlapping images, created point clouds, and then we generated 21 different digital surface models (DSM). The differences based on the number of images we used in each model, and on the flight height. We repeated the flights three times, to compare the same models with each other. Besides, we measured 129 reference points with RTK-GPS, to compare the height differences with the extracted cell values from each DSM. The results showed that higher flight height has lower errors, and the optimal air base distance is one fourth of the flying height in both cases. The lowest median was 0.08 meter, at the 180 meter flight, 50 meter air base distance model. Raising the number of images does not increase the overall accuracy. The connection between the amount of error and distance from the nearest GCP is not linear in every case.


Introduction
Some new technological innovations can help us to examine the geographical phenomena in a new approach.We can use unmanned air vehicles (UAV), and small format digital cameras to perform these tasks (Colomina, Molina, 2014;Stöcker, et al., 2015).These devices can fly in a low altitude, and carry several kind of sensors -optical (visible or IR) (Van Leeuwen, et al., 2009), hyperspectral (Burai, et al., 2015), or LiDAR -to obtain high resolution databases from the examined area.The traditional airborne imageries are often expensive however covers larger areas with high quality (Karátson, et al., 2012), but in several cases there is a need to examine smaller areas, or higher spatial resolution is required.Nowadays there are some affordable small format digital action cameras with a high optical resolution and light weight.
However the accuracy of those databases can be highly diverse.Like other databases those also has some errors that need to be quantified in order to make decisions (Weng, 2002).The accuracy depending on several factors: the applied technology, the application method, the post processing, the properties of the area, flying height (Rock, et al., 2011) and the resolution of the sensor (Rayburg, et al., 2009;Tahar, 2015).
Using UAV systems we can create point clouds, orthophotos, and digital surface models (DSM).These spatial databases can be the base of several studies (Darwin, et al., 2014;Bachmann, et al., 2013;Telbisz,et al.,2011).These systems can be used for producing aer-ial-digital maps (Ahmad, 2011).Some applications are for special urban modelling and forestry (Haala, et al., 2010).Several studies investigating the micro-level effects of erosion (d'Oleire-Oltmanns, et al., 2012;Harwin, Lucieer, 2012).Karabork et al. (2004) compared two methods: they measured errors directly from the photogrammetry software, and another one with resampling the raster grid, and compared them.Turner et al. (2012) also compared two methods: directly georeferencing the images, and using GCP (Ground Control Point) coordinates.Lucieera et al. (2013) were using UAVs in the East-Antartica to investigate moss beds.Neithammer et al. ( 2011) investigated glacial crevasses in France.Ruiz et al. (2013) were investigated how the accuracy of DEMs changing, depending on using different positioning system.
Nowadays there are two main ways to create DSMs: the photogrammetry and LiDAR technique.There are some traditional ways to create DSM (Farah, et al., 2008) as well (e.g. using field survey equipment), but it takes lot of time and effort to obtain enough surface points.Moreover, if we want to follow-up the variability of the micro-topography, a few cm spatial resolution is essential, which means we need high amount of points.The main advantages of this technologyon the contrary with traditional satellite images and air borne images -low operating cost, the flexibility of route planning, easy to repeat the survey, very high spatial resolution, automated processing and high ac-curacy (Küng, et al., 2011).UAVs have been appeared in the resource management, due to the flexibility and low cost image acquisition (Laliberte, Winters 2008).Zhang et al. (2011) were producing precise DSM, digital orthophotomaps, and 3D city models by using the UAV photogrammetry method.
In the past decade the evolution of computers enabled the rapid developing of the photogrammetry technique.There are many different software available to process images, e.g.Agisoft Photoscan, MicMac (Rosu, et al., 2015), Pix4D (Draeyer, Strecha, 2014), INPHO (Trimble, 2013), or Erdas Imagine LPS (Tahar, Ahmad, 2011).Our aim was to (1) repeat a flight 3 times to find out if we are able to reconstruct the investigation; (2) to present two different altitude flight from the same area and with the same condition to explore the differences; (3) to investigate the height accuracies of different air based distance models; (4) and to explore potential connection between errors and distance from nearest GCPs.

Data acquiring and processing
The selected area of the study was a derasion valley at the source of the Tócó-creek, north to Debrecen, in Hungary (Fig. 1).The 11 hectares (0.11 km2) area is under agricultural cultivation.
The location of the study area.
The camera was attached to a DJI Phantom 2 quadcopter.The sensor we used was a GoPro Hero 3+ Black Edition camera (focus length: 2.77mm, lens size: 14 mm), and the resolution of created images was 12 MP.The average distance between two adjacent images (air base distance) was 10 meters, it provided more than 90% overlapping on the surface.There were two different flight heights; the lower was 70 m, the higher 180 m above the surface.We repeated the flights three times in both heights, to study the repeatability of models.9-9-9 different ground control points (GCP) were measured with a high-accuracy RTK-GPS (Stonex S9 RTK system; accuracy: ±2 cm) to determine the point positions of all repeated flytings.Furthermore we collected 129 reference points with ±2 cm accuracy as well.
During data processing we used the Agisoft Photoscan Pro software to create the digital surface models from the overlapping images.We created four sub-categories based on the air base distance in that certain models, configuring 10, 20, 50 and 100 meters, respectively.
From the possible maximum DSM (24) we created 21 because 3 models (lower altitude, 100 meter air base distance) were not enough overlapping and connecting points in the images to create the point cloud.From the overlapping images the software created dense point clouds.The software parameters were set to medium accuracy and quality.It means that the software reduces the original images to one-fourth times in both directions (rows and columns).The identification of the GCP was accomplished in an automated way, and corrected manually in every image.After that the GCP coordinates were imported from a text file and located them, the DSMs were generated.
The cell values were extracted from each DSM at the locations specified by the reference points, and stored the height values in attribute table.The reference and DSM values were exported to a text file, in order to create further statistical analysis.For the statistical evaluation we used Past, and IBM SPSS statistical software packages.
Slope angle maps were generated from each DSM on one hand to compare the slope values with the height errors, on the other hand to examine potential correlation between them.We tested the correlations both relative height errors and the absolute height errors as well.
A distance map was created in IDRISI Selva software, which was based on the 9 GCPs, to examine that how the errorsgetting to propagate, depending on the distance ofnearest GCP.

Statistical analysis of the DSMs
The normal distribution of the 21 different surface models were tested with the Shapiro-Wilk test.It showed that the distribution of the error values had not normal distribution, therefore, (instead of mean average and standard deviation) median, maximum error, and the lower and upper quartile values have been used.
Kruskal-Wallis test was used in Past 3.11 software to determine the differences among the DSM medians and the reference values.Kruskal-Wallis nonparametric test allows the comparison of more than two independent groups (Kruskal, Wallis, 1952).The normal distribution of data is not required, but it's equivalent to one-way ANOVA.We applied the Mann-Whitney U test as post hoc test, because non-parametric tests are less sensitive to outliers.
Scatter plot has been created to visualize the connection between the distance of the GCPs and the amount of the errors.The Ordinary Least Squares (OLS) algorithm method were used to draw a line in the plot, to minimize the squared errors in the Y values.
Bland-Altman plot has been created to visualize the outlier values, which causes the non-normal distribution.The Bland-Altman plot (or difference plot) is a data visualization method, that can compare two different methods (e.g. an older and a newer), which measuring the same parameter (Bland, Altman, 1986).In this plot the differences between the two methods are plotted against the mean of the two methods.

Statistical evaluation of the DSMs
The 21 generated DSMs have been grouped by the number of flights and their air base distances (Tab.1).The first part of ID abbreviation refers to the number of the flight, the second part refers to the higher (180 meter) or the lower (70 meter) flight, and the third part represents the air base distance in meters (e.g.2_l_50 -2nd flight, 70 meter flight height, 50 meter air base distance).As the air base distance is increasing, the number of used images decreasing.Overall, the 180 meter high flight has 628 images, the 70 meter flight has 503 images.The density of the point clouds (which is the base of the DSM in Agisoft) is independent from the number of the used images.The 180 meter high flights have an average 6-7 point/m2, the 70 meter high flights have 45-50 points/m2 point density.The spatial resolutions of the final DSMs are highly determined by the flight heights.The 180 meter flights have 0.37-0.4m spatial resolutions; the 70 meter flights have 0.14-0.15m resolutions.Although the better spatial resolution does not always mean better accuracy.The best model scenario (according to the median of the model-errors) was the first flight, 180 m flight height and 50 meter air base distanced model (Tab.1).That model had a 0.08 m median error.It is clear, that not the smaller air base distance models (which contain the most images) were the most accurate.All of the three repetition, the 180 m height imaged models had smaller median results between 0.08 m and 0.25 m (which refers to higher accuracy).The 70 m height models have higher error median values, but the models with 50 meter air base distanced were emerged even among those.Those three models have highest median error (2.13 m), and the highest maximum error (21.76 m).The pairs of the three repeated flying's have similar errors values (180 meter height -50 meter air base distance -0.08 m, 0.10 m, 0.09 m median errors; 70 meter height -10 meter air base distance 0.43 m, 0.35 m, 0.36 m median errors. A boxplot of errors of each model have been created and sorted by the median values (Fig. 2).The X axis represents each model version, the Y axis stand for the vertical error in meter.For this case we excluded three models with the highest median values (70 meter flight -50 meter air base distance models) due to overflowing charting difficulties.The boxplots shows the median values, the interquartile range (IQR), and the maximum and minimum errors.The medians are between 0.08 m and 0.43 m.The higher flights has lower median values -except for one case -3rd repetition, lower flight, 20 meter air base distance model.In traditional photogrammetry the optimal air base distance is usually one fourth of the flying height (Kraus 1998).The same trend can be seen in most of our dataset.In the case of the 180 meter flight models, the optimal air base distance would be the 45 meter, therefore the 50 meter air base distance models are the closest to this distance.These models have 0.08 m -0.10 m median errors.The 70 meter flights optimal air base distance would be 17.5 meter, therefore in this case the 20 meter air base distance models should be the closest to optimal.From the three repetition two times it is true (1l20, 3l20), the 20 meter air base distance of the second repetition has 0.39 m median against the 2l10 model 0.35 m median.It is noticeable, that the median error is increasing as the air base distance increasing or decreasing from to optimal point.Sorting the values of the models by IQR (inter quartile range) it is clear, that the two different flight heights are separated as well (Fig. 3).Furthermore the models of the three repeated flights are well separated too in the case of the 180 meter flight height -50 and 100 meter air base distance models.This separation is not as clear among the 10 and 20 meter air base distance models as before, but the different flying heights are apparent.Both of the sorting shows that the higher flights -50 meter air base distance models have the lower median errors, after that comes the 10 and 20 meter air base distance models with a little higher error.
The three lowest median models are missing from this figure too, although these models have also the highest IQR.The 70 meter flights has a wider IQR, than the 180 meter models, the 2l20 model has almost four times wider IQR, than the 3h 50 model.
The results of correclations between the slope values and height errors showed that there is nosignificant connection, because the highest r value was 0.36, which means a weak connection.

Errors and the distance of the GCPs
tThere are the 3x9 GCP locations, (represented by white crosses) and the absolute errors of the 129 reference points, in case of the best model (1h50) (Fig. 4).
In general, we can state that the largest distance from the nearest GCP gives largest error, but in some cases there are large errors close to GCPs as well.The latter, however more typical of the outer regions, where the distance of the closest GCP is larger than among the GCPs.
Kruskal-Wallis test revealed that in general there were no significant differences in height among the reference values (p>0.05) and the DSM values (except two cases: 1l5, 3l5; -first and third flight, 70 m altitude, 50 m air based distanced models).

Filtering check points, averaging three models
Although the accuracy of the applied GPS was about 2 cm, on a field survey in many cases there are distractions (e.g.trees, bad DOP, etc.).It was necessary to exclude the RTK-GPS errors from the check points, therefore the errors under 0.1 m have been filtered.Thus, the new dataset showed only the errors which are independent from the RTK-GPS.Furthermore the three lowest IQR models have been averaged to exclude any human subjectivity.The remaining dataset contained 42 check points, which means that 67% of the original errors was between -0.1 and +0.1 meter.A scatter plot has been created to show how the relative errors going to distribute, increasing the distance from the GCPs (Fig. 5).The X axis represents the distance of GCPs in meter, the Y axis represents the mean of the best three models (1h50, 2h50, 3h50) errors.Furthermore the errors between -0.1 cm and +0.1 cm was filtered to exclude the possible RTK-GPS errors.A black line was calculated with the OLS (Ordinary Least Squares) method to minimize the squared errors in the Y axis.The blue points represent values which are outside of the 1.96 SD range.As the distance is close to the GCPs the errors are relatively low, but as the distance increasing it is not that clear, there are lower and higher errors as well.Clearly the variability of the errors is unequal across the range of the distance, therefore heteroscedasticity occurs at the model.The same filtered dataset was created to show how the relative errors distribute (Fig. 6).The blue markers represents the negative errors, the red markers represents the positive errors.Where bigger the diameter of the markers, the errors are higher.The same 7 outliers are selected in the map (which are selected in Fig. 5), marked with light blue colour.Those outliers are in the north part of the map, and one in the south part, and relatively far away from the GCPs.Furthermore the positive and negative errors are well separated from each other.

G F
A Bland-Altman plot was created to analyse the agreement between the RTK-GPS (observed) values and the average of the 3 lowest IQR models (predicted).The difference between the observed and predicted values plotted against the mean of the observed and predicted values (Fig. 7).The horizontal blue line represents the mean difference, the dashed red lines represents the agreement or the 1.96-times the standard deviation of the difference.Besides that the 95% confidence interval (CI) lines were drawn of the mean difference (green lines) and of the limits of agreements (blue lines).The same 7 outliers are marked at this figure as well (blue circles).These outliers are the cause of the non-normal distribution of the dataset.The mean value of the average of the three models is -0.19 m.A maximum positive error is almost 0.5 meter, the lowest negative errors are around -1 meter.Half of the points are in the 95% confidence interval of the mean.

Discussion
According to Ahmad (2011) a sub-meter accuracy can be achieved by a small format digital camera and an UAV (Tab.2).Barry, Coakley, (2013) used similar methods like us, in a 2 ha size area, and they have a high accuracy value (0.07 m) even though they used the American National Standard for Spatial Data Accuracy (USA Standard, 1996).It means that the 95% of the errors are below than the accuracy value.d' Oleire-Oltmanns,et al., (2012) et al were monitoring soil erosion with aerial-based DTM with high resolution dataset (0.05-0.08 m mean vertical error).At Haala, et al., (2010) the method was similar as this paper, but they also compared their results with LiDAR, and theconclusion was that further developments and tests needed in this field.Harwin, Lucieer, (2012) got high accuracy (0.04 mmean error), at investigation of erosion effect in Australia.However, they claimed that in case of complex vegetation the results are not reliable.Karabork, et al., (2004) claimed that the errors of the two different methods are approximately same.Küng, et al., (2011) also had a low RMSE value, and they found that the accuracy highly depend on the ground resolution, which is closely linked with the resolution of the sensor and the flight height.Laliberte, Winters, (2008) reached a 0.3 m RMS error on a relatively flat area, but they also mentioned that further test of image processing needed in an area with higher relief.Lucieera, et al., (2014) found that the dronebased investigations can detect snow cover changes, and it is impact of moss health in East Antartica.Neithammer, et al., (2011) claimed that UAV-based digital terrain models are a capable to provide accurate data, but also improvements are required to minimize georeferencing erros.Ruiz, et al., (2013) claimed that more experiments are needed about the distance of the ground sample points and the flight planning param-  The lowest vertical error median among model scenarios is 0.08 meter.The errors of the three repetition model pairs showed differences but overall similar height accuracies have been achieved.The 180 meter flights have the median errors between 0.08-0.25 meters.The medians of the 70 meter high flights are between 0.13 -2.13 meters.The errors of the different air base distance models have changed as the optimal air base distance was modified.In the case of the 180 meter high flight, the 50 meter air base distance models were the optimal ones with the median error between 0.08-0.1 meters.In case of the 70 meter high flight, the 20 meter air base distance models were the optimal ones with the median error between 0.13-0.39meters.The distribution of the errors in the map (Fig. 6) showed the higher errors are likely to appear far from the GCPs, however the scatter plot (Fig. 5) showed the low errors can occur in high distances too.

Conclusion
The number of the used images not necessarily decreases the median errors of the models.In case of high number of overlapping images, there is a possibility to filter them, to reduce the processing time, without loss of accuracy.We reached median vertical accuracy between 0.1m and 0.2m at a 180m altitude, with a GoPro camera and a DJI Phantom 2 drone.The three repetitions of flights showed similar tendencies, therefore the investigation can be repeated with com-parable results.The lower altitude flight gave higher average errors (0.3m -2.2 m) especially at the less dense image-overlapping cases.The 70 m height models higher median errors and maximum errors were due to their lower flight altitude from which the sensor detected a smaller area, therefore there were less tie points among the images.The optimal air base distance in the digital-stereo photogrammetric investigation is similar with the traditional method -the air base distance should be one fourth of the flying height above the surface.There is not a linear relationship between the height errors and the 300 meter distance of the GCPs.Further analysis needed to examine different software settings (higher parameters leads to lower error), and to have an objective comparison use different photogrammetric software.

Figure 4 .
Figure 4.The size of the meanerrors, with the distance of nearest GCP (Model: 1_h_50)

Figure 5 .Figure 6 .
Figure 5. Connection between relative errors and the distance of GCPs.

Figure 7 .
Figure 7. Bland-Altman plot of the reference (observed) and the average of three models (predicted) values

Table 1 .
The models descriptive statistics Figure 2. The errors of models box plot sorted by median Figure 3.The errors of models box plot sorted by the interquartile range

Table 2 .
Height errors comparison of other studies.