Sensibility А nalysis of the Object Tracking Algorithms in Thermal Image

In military application target tracking has always been an interesting and challenging problem. Nowadays, it has also found its place in civil applications, especially concerning surveillance and monitoring. Until recently, thermal imagery (image is formed based on infrared spectrum radiation) was considered only in military applications because of the price and size of cameras. Also, thermal image quality was not as good as TV (image is formed based on visual spectrum radiation) camera image. The situation has changed, and thermal cameras are now widely used in many kinds of applications. Thermal image is different from TV camera image, as it measures temperature difference between objects and background. Therefore, it has an advantage over television cameras, since it can be used in low light conditions and in dark. This paper examines the options for a coarse and quick algorithm for rough target locating in thermal image, where the target is a small Unmanned Aerial Vehicle (UAV). Three different feature descriptor algorithms are tested in thermal imagery target tracking. Feature descriptor methods are widely used in visual imagery, but the goal of this paper is to examine their usage in thermal imagery. That is why three different feature descriptor algorithms from three different families are tested: FREAK (Fast Retina Keypoint), SURF (Speeded Up Robust Features) and MSER (Maximally Stable Extremal Regions). The algorithms are tested in case of translation, rotation, blur and size change of an object of interest, as well as in the case of noisy image. Since none of the tested methods works great in different situations, new, multi-stage algorithm is proposed. This algorithm is based on MSER and SURF algorithm combination, with a goal to use the advantages of each of them in different real situations. The obtained results show that the new, multi-stage algorithm has got the best performance among the group of the tested methods. All the algorithms are implemented in Matlab software.


Introduction
ARGET tracking represents a challenging problem in a field of computer vision.Indicators of the popularity of the topic are challenges or contests like the Visual Object Tracking (VOT) challenge [1,2] and different workshops [3].
Tracking in thermal infrared imagery has been of interest for many military applications.Nowadays, with the increase in image quality, as well as with the decrease in camera size and price, new civil application areas are opened.Thermal cameras measure the emitted or reflected radiation of objects in the scene, with the great advantage, consisting in an ability to see in total darkness, as well in the robustness to illumination changes.Another advantage, in military sense, is that these are passive sensors without the need for active illumination (artificial lighting or any other kind of active radiation) [4].
Our task is performance and sensibility analysis of different algorithms in thermal images.Thermal image is quite different from video TV camera image, and cannot be treated as a grayscale visual image.There are no shadows in thermal infrared, and noise characteristics are different than in visual tracking.There are no color patterns, but patterns from variations in material or temperature of objects.Majority of algorithms based on feature description are designed for and tested on TV camera videos and images.The goal is to test different feature descriptors on thermal images, in order to see how these algorithms can be used in target tracking with thermal cameras, which is a challenging research topic.
This paper explores the options for a coarse and quick algorithm for rough target locating, which can represent the first part of a more complex target tracking algorithm, similar idea as described in [5], where SIFT algorithm was used as an algorithm for rough location of the target.In our case the target is Unmanned Aerial Vehicle (UAV).
In the process of target tracking, the first step represents a description of an object that is a subject of tracking, which is a task of a crucial importance concerning the application of tracking, since it directly influences the results of tracking.The quality of an object description determines tracking success.Object is described by its interest points, which can be corners, blobs or T-junctions.The interest point detector finds these points, and the neighborhood of each interest point is represented by a feature vector (feature descriptor).The goal of this paper is to compare three different descriptors from three different families of feature descriptors.A comparison is done for Local Binary Descriptor FREAK, Spectra Descriptor SURF and Polygon Shape Descriptor MSER.The first section of this paper explains the theory behind these feature descriptors, and why exactly these mentioned descriptors were chosen for the comparison.The next section shows the results of testing with each different algorithm, in the sense of translation, rotation, blur, size, and T noise impact on image.The third section explains the idea and the design of the new algorithm for feature comparison, based on the results obtained by testing SURF, FREAK and MSER.The new algorithm is a multi-stage algorithm which is MSER algorithm basically, and in some special cases where MSER does not have good performance, a SURF algorithm is used.Obtained results give the algorithm with the best performance among these tested methods.So, the contribution of this paper is a performance and sensibility comparison of three different feature descriptors on thermal image with a proposal for the new multi-stage algorithm.

Feature Descriptor algorithms
Features represent specific structures in an image, such as corners, some specific shapes or blobs.In this paper, the focus is on local feature descriptors: local patterns, shapes, spectra.-Polygon Shape Descriptors take the shape of objects as measured by statistical metrics.These descriptors compute a set of shape features for polygon or blob and describe the shape by using image moments.Typically these methods are applicable to larger region size.This family includes MSER Method, Object shape Metrics for Blobs and Polygons, Shape Context [6].

SURF
From the family of Spectra Descriptors, a chosen candidate is SURF (Speeded-up Robust Features Method).In order to have a desirable performance and a reasonable computation speed, a balance between these requirements is needed, which was the main goal in SURF development.The focus of SURF detector is scale and rotation invariance, because these kinds of descriptors offer a good compromise between basic requirements [7].
The SURF algorithm is based on the multi-scale space theory and Hessian matrix, and uses its basic approximation.SURF creates a stack, and as a result it has images of the same dimension.Due to the use of integral images, SURF filters the stack using a box filter approximation of the second order Gaussian partial derivatives.Integral images allow the computation of rectangular box filters in near constant time.In the next figure, Gaussian partial derivatives of the second order in x and y directions are shown [8].[8] From this family of spectra descriptors, which belong into the category of computationally more intense and memory consuming methods with great performance SIFT [9] is commonly used as a benchmark against other methods.The reason why SURF was chosen instead of SIFT algorithm, is because it is faster than SIFT, and has comparable performance.This descriptor is robust relative to the following four characteristics: scale, rotation, illumination, noise [6].

FREAK
From the family of Local Binary Descriptors, a chosen candidate is FREAK (Fast Retina Keypoint).SURF and SIFT descriptors from the family of Spectra descriptors have a great performance, but they are based on the histogram of gradients, that needs to be computed for every pixel, which costs time.This is when binary descriptors come in handy.Binary descriptors are composed of three parts, a sampling pattern, orientation compensation and sampling pairs.FREAK algorithm is a descriptor inspired by the human retinal computation.Corners are considered as keypoints [10], [11].First step is to take a sampling pattern around the keypoint.The FREAK sampling pattern is shown in Fig. 2, and it represents overlapping concentric circles with more points on inner rings [12].The next step consists of choosing pairs of points, for example 512 points on this sample pattern.Next, a comparison of the intensity between pairs is done, and in the case where the first value is greater than the second a "1" is written, otherwise "0".After this process, we have a 512 binary characters string that encodes local information about the keypoint.As for the comparison process, it is quite easy, because it represents a comparison of two binary strings.
The reason FREAK is chosen is because it is fast for computation, has good discrimination compared to other binary descriptors and combines performance, accuracy and robustness.It has six robustness characteristics: brightness, contrast, rotation, scale, viewpoint and blur, as explained in [6].

MSER
From the family of Polygon Shape Descriptors, a chosen candidate is MSER (Maximally Stable Extremal Regions).The MSER method is actually an interest point detector, and its result is a set of distinguished regions for every single image detected in a grey scale image.The method takes the pixels of zero intensity and progressively adds pixels with higher intensity levels while monitoring the regions these pixels form.Extremal region is a largest connected region in each stage.So, as the grey levels are added, the regions grow.The change of area, normalized by the area of the connected component is used as a stability criterion.Maximally stable extremal regions are connected components close to stable over a range of intensities.Each MSER is represented by the position of a local intensity minimum in the rate of change of the area function.This relative area change is an affine invariant property.
Some of the advantages of this method are: multi-scale features and multi-scale detection; variable size features are calculated globally across the entire region; it is affine transformation invariant; it is generally invariant to shape change and stability of detection [6].This is a robust and a fast feature detector [13][14][15].

Feature matching simulations and results
In order to compare the SURF, FREAK and MSER algorithms we will present the test results.Testing is performed for images taken in different real situations, so that rotation, blur, scale change and noise influences can be discussed for each of these tested algorithms.An object of interest is a small UAV in a thermal image, shown in a scene with background in Fig. 3. From the video sequences, frames are extracted and further comparison is performed on these frames.Size of frames is 576 x 768.The testing procedure is a comparison of a referent frame object with the comparison frame.Two frames of the referent object from two different sequences are shown in Figures 4 and 5.

Translation
First step represents translation examining for an object shown in Fig. 4. All three algorithms are tested, by using Matlab software package.After finding the keypoints and feature descriptors for SURF, FREAK and MSER algorithms the results of comparison are shown in Fig. 6 and Table 1.
As it can be seen from the Table 1 and Fig. 6, all three algorithms are quite capable of comparing a great number of features.At a first glance, it is easy to think that the results are not as good for MSER as in the case of SURF, since we can see only a few mutual features, ten to be precise.But, this is an algorithm that extracts regions of interest, and does not find as many regions as other algorithms find features.Therefore, the result of comparison is good, having this in mind.

Rotation
In case where the rotation of an object was examined, an object from Fig. 4 was used as a referent object, and a result is shown in Fig. 7 and Table II   These tested frames are not successive frames; they intentionally have two frames between.It is obvious that algorithms manage to find the same features on both frames, but with a greater degree of an angular rotation of an object, these results can be worse.In case of SURF algorithm, we have 12 matched features, but 4 are false matching, so only 8 correct matching results.We tested the algorithm in order to see where the limit of rotation is, so that we can see how many frames we can skip between two comparison frames.The results are shown in Table III.From the given results, it is obvious that SURF and MSER algorithms show very good results, in the number of matched features, but also when difference between frames is larger.On the other hand, FREAK shows weak performance in comparison with SURF and MSER.

Blur
In case of examining the blur impact, an object in Fig. 5 was used as a referent object, and the results are shown in Fig. 8.As we can see, the algorithms are capable of comparison when blur effect is not intense.In case of a more intense blur, which was also none of the tested algorithms has good results, since blur affects and changes feature descriptor vector significantly.

Scale change
In case of examining the scale change impact, an object in Fig. 5 was used as referent object in a first comparison case, where the referent object is smaller than the comparison object.The results are shown in Fig. 9 and Table 5.As we can see, SURF and MSER algorithms show good results of the comparison when the dimensions of an object change.FREAK algorithm does not have any matched feature, so the results are not shown for this method.Now, we will show the results for the case when the referent object is bigger than the comparison object.Again, FREAK algorithm does not have any matched feature, therefore the results are not shown for this method.The results are shown in Fig. 10 and Table 6.The results are good in case of SURF algorithm, both for the case when the target has decreased and increased dimensions.On the other hand, in case of MSER algorithm, as we can see less matched features are found, especially in case 2, where we have only one correct recognition, and the other is a false recognition.In this case SURF algorithm shows the best results and outperforms FREAK as well as MSER methods.

Noise
In case of examining noise, an object in Fig. 5 was used as referent object, and the results are shown in Fig. 11 and Table 7, which shows the effect of "salt and pepper" noise.As we can see, in case of the noise variance value of 0.02, the algorithm is quite capable of comparing a great number of features.Good results are also obtained in case of Gaussian white noise of a variance value 0.02.The results are shown in Fig. 12 and Table 8.In order to compare the time for each algorithm calculation, elapsed time for each algorithm is given in Table 9.Given times are not optimized, but only show the time difference between each algorithm calculation.

Results
In case of pure translation of a target, all algorithms have good results.
In case of the rotation impact examination, the results obtained by using MSER algorithm show the best performance, in sense that feature matching on comparison frame in case of target rotation was successful for 10 frames between the two tested frames, while in case of SURF algorithm that number was 8, and in case of FREAK only 3. Also, SURF algorithm shows some false matching results, while that is not the case with other algorithms.Speed of algorithm itself is very similar in case of FREAK and MSER, while SURF is about 30% slower.
In case of the blur impact examination, the results show that the best performance is obtained by using SURF algorithm.In comparison with SURF algorithm, MSER does not have as many matched features, but it is important to emphasize that MSER finds regions of interest, not just interest points, so it is normal to have less features than other two algorithms.On the other hand, the results for FREAK algorithm are not as good as SURF.
In case of an object scale impact examination, when comparing the results, SURF algorithm gives the best results, and MSER algorithm is 2 nd best.FREAK has very bad results.
In case of the noise impact examination, based on the results obtained by testing all three algorithms, the conclusion is that noise of variance value lower than 0.02 does not affect the results of comparison.Of course with the variance value increasing, noise has greater impact on the feature matching results.
Based on the previous analysis, final conclusion is that FREAK algorithm shows the worst results in case of tested thermal images.MSER algorithm shows very good results in feature matching, and it is faster than SURF algorithm.

Multi-stage algorithm based on MSER and SURF algorithm's combination
Based on the results shown earlier and conclusions about each algorithm's performance, a proposition for the new algorithm is a combination of MSER and SURF algorithms.
As it has been shown in the test results, MSER algorithm shows the best performance, or in some cases equally good as SURF, but it is faster in calculations, so it is a basic algorithm for multi-stage combination.As it is obvious from the results from the previous section, when the dimensions of objects change from frame to frame SURF gives the best results, and MSER does not give very good results.That is why, in the case of the change of the target region, SURF features comparison is done.Also, in every case when MSER did not give good results, SURF is done as well.
This new multi-stage algorithm calculates the difference between surfaces of the referent frame target region and the comparison frame target region.If the absolute value of this difference is greater than a specified threshold then this is a condition to enter a SURF part of algorithm.The second condition of activation of SURF algorithm calculation is when there are no results of feature matching between the frames calculated by MSER algorithm.All algorithms are written in Matlab software package.

Testing Results
In case of examining translation, an object in Fig. 4 was used as a referent one, while in case of blur and noise examination, the referent object is object in Fig. 5.In these cases, since MSER algorithm gives the results of feature matching, and there is no significant change of object dimensions in image, the results are the same as in the case of MSER algorithm testing, since the conditions for turning on the SURF algorithm are not fulfilled.
In case where the rotation of an object was examined, an object from Fig. 4 was used as a referent object, and a result is shown in Fig. 13.These tested frames are not successive frames; they intentionally have a number of frames between the two used for comparison.We tested the algorithm in order to see where the limit of rotation is, so that we can see how many frames we can skip between two comparison frames.In this case we can skip 11 frames, which is a better result than in the case of only one algorithm.In Table 3 the best results for number of frames between matching was with MSER algorithm, 10 matched features.In case where 11 frames were skipped, since MSER method does not result in any matched feature, SURF is performed.SURF gives good results, and now we have 11 skipped frames.As the number of the skipped frames increases, false recognition probability increases.In case shown in Fig. 13, where the target is significantly rotated, we have 4 matched features, 2 false and 2 correct matches.In case of examining the scale change impact, an object in Fig. 5 was used as a referent object, and the results are shown in Fig. 14 for the MSER algorithm only and in Fig. 15 for the combination algorithm.As we can see, the algorithm shows good results of comparison, since SURF algorithm was activated in this case.

Conclusion
Target tracking is an important field in modern warfare, as well as in civil applications.Thermal cameras have also become an important part of surveillance, recognition and tracking systems in many military and civil applications, because they represent passive sensors that can be used in low light visibility conditions, where TV cameras are not useful.
In this paper, a sensibility analysis of the three object tracking algorithms was performed, with the proposal of a multi-stage algorithm that is a combination of the best candidates out of the tested group.Feature descriptor algorithms were commonly used in visual imaging more than in thermal imaging.That is why the importance of this paper is in introducing these methods in thermal vision and comparing their performance.MSER and SURF algorithms both have very good results, but MSER algorithm is faster, which is very important characteristic when dealing with the real time systems.MSER algorithm also has the best matching results when the situation of skipping frames is examined.It gives a possibility of skipping a few frames while preserving good matching results, which provides us with even faster algorithm.On the other hand, MSER algorithm has a weakness, which is a scale change in an image.From the results shown in this paper we could have seen that SURF gives the best results when dealing with this particular case.That is why, in the case of the scale change, SURF method is used.SURF also gives the possibility of skipping a few frames and preserving a good result, so its usage in the case of scale change will not affect the general speed of the algorithm.But skipping frames is for special cases only, in order to minimize the speed of calculations in cases where this is a necessity.That is why the base of the multi-phase algorithm is MSER (faster) with SURF used only in special cases where it is of an importance for matching task.With this kind of combination, better robustness can be achieved.
A comparison of different feature descriptors in thermal image target tracking represents a good starting point for the future research of tracking applications in the infrared imaging area.In the field of target tracking in thermal imagery, these results can easily be implemented in cases of rough target location evaluations.For the future work, it is planned to incorporate the new algorithm shown in this paper, as the first stage for initialization of the tracking algorithm in thermal imagery.Mots clés: suivi de cible, estimation de situation, image thermale, algorithme, descripteurs des propriétés.

Analiza osetljivosti algoritama
Features are interest points, or key points in image.Feature descriptors calculate a feature description from the pixel region surrounding the interest point.Object is described by groups of interest points and descriptors.Therefore, based on local taxonomy, features are divided into the families: -Local Binary Descriptors that sample point pairs in a local region and create a binary coded vector.These descriptors are efficient for computation, storing and matching by using Hamming distance.Algorithms of this family are: LBP, FREAK, ORB, BRISK, Census.-Spectra Descriptors use a range of spectra values, gradients and region averages.This group of descriptors typically involves more intense computations and algorithms, and may consume considerable memory.Algorithms of this family are: SIFT (and its variations), SURF (and its variations), CenSurE, HAAR, HOG (and its variations), Daisy, O-Daisy, and CARD.-Basis Space Descriptors encode the feature vector into a set of basis functions, such as Fourier series of sine and cosine magnitude and phase.This group of descriptors is very useful in order to gain insight into the data.Algorithms of this family are: Fourier Descriptors, Sparse Coding Methods.

Figure 1 .
Figure1.SURF -Gaussian partial derivatives of the second order in x and y directions[8]

Figure 3 .
Figure 3.A referent object with background First, keypoints and feature descriptors are calculated for the frame with referent object, and then for the comparison frame.Then these obtained descriptors are compared and the results are shown for each algorithm.All the tested algorithms, as well as used functions are available in Matlab software package.The first part of every feature algorithm is a keypoint detection, and for this purpose SURF, Harris, FAST and MSER detectors available in Matlab software package are used.The next action is detected keypoint descriptor forming, using SURF, FREAK and MSER keypoint descriptors.

Figure 7 .
Comparison of frames: rotation

Figure 8 .
Comparison of frames: blur

Figure 11 .
Comparison of frames: noise "salt and pepper"

Figure 12 .
Comparison of frames: Gaussian noise

Table 1 .
Number of matched features, translation

Table 2 .
Number of matched features, rotation

Table 3 .
Number of skipped frames, rotation

Table 4 .
Number of matched features, blur

Table 5 .
Number of matched features, scale change, case 1 Algorithm SURF MSER Num.of match.features 8 3

Table 6 .
Number of matched features, scale change, case 2

Table 7 .
Number of matched features, noise "salt and pepper"

Table 8 .
Number of matched features, Gaussian noise

Table 9 .
Elapsed time