Performance Evaluation of Machine Learning Techniques in Lung Cancer Classification from PET/CT Images

Lung cancer detection is highly challenging as it is asymptomatic till advanced stage. Early lung cancer detection helps to increase the patient’s survival. Computer Aided Diagnosis (CAD) systems have been developed using Machine Learning (ML) and Artificial Intelligence (AI) techniques in detecting malicious regions from medical images. This study is intended to compare the classical ML techniques for lung cancer classification from Positron Emission Tomography/Computed Tomography (PET/CT) images. Significant texture and fractal descriptors extracted from PET/CT images generate non-linear data and were fed as inputs to the classifier. Various hyper-parameters and model parameters for the ML techniques have been tuned to fix optimal parameters for better performance. 10-fold cross validation was used to analyze the performance of the classifiers. Experimental study showed that Support Vector Machine (SVM) with Radial Basis Function (RBF) kernel of width, σ = 1 outperformed and achieved highest accuracy of 98.10%.


INTRODUCTION
PET/CT is a supreme imaging technique in lung cancer diagnosis, staging and treatment planning as it provides complementary morphological and anatomical information [1].Major drawback in PET/CT is the False Positives (FPs) due to inflammations or infection caused by other lung diseases [2].Also, in PET/CT, the CT scans performed at minimal energy settings to reduce the radiation effects result in poor image quality; which affects interpretations and diagnostic accuracy [3].
Huge volume of images burdens the medical experts in interpretation which may result in error prone diagnosis.Therefore, CAD systems are highly essential to assist radiologists in lung cancer diagnosis with increased speed, accuracy, minimum diagnostic errors and less time [4].CAD plays a major complementary role in medical diagnosis [5].CAD systems developed using image processing techniques and AI for lung image interpretations using various image modalities.Current state-of-art techniques in lung cancer CAD systems, challenges faced in implementation steps, strengths and drawbacks of the existing approaches were reviewed and presented [6].
Texture Analysis (TA) plays a key role in discriminating between normal and abnormal regions in medical images.Texture features are found beneficial in image classification [7].Critical comprehensions into the latest development of TA for quantifying the heterogeneity in PET/CT images were provided and offered recommendations for the use of TA in clinical research [8].
First and second order statistical texture features to detect the lung cancer from CT images were analyzed [9].However, texture features might be insufficient to identify and classify smaller lymph nodes.Medical images are characterized by irregular complex tissue structures which cannot be quantified by traditional Euclidean geometry.Hence, Fractal geometry has been used widely to analyze medical images and found significant to find the aggressiveness of the lung cancer [10].
Classification algorithms play an important role in CAD systems in detecting the malicious regions from medical images.Supervised ML techniques such as Artificial Neural Networks (ANN) and SVM have been recognized as robust models in classifying multi-dimensional, complex data.ANN based lung cancer detection system has been established and utilized first order texture features extracted from CT images [11].Significant texture features yielded a better accuracy of 98.4% using ANN compared to 93.2 % by SVM [12].Classification using Hop field neural network achieved 98% accuracy in detecting lung cancer [13].Recent studies have demonstrated that SVM has been found as a popular ML tool in classification of medical images.A new computerized system for lung cancer classification was developed by several researchers using SVM classifier to improve the diagnostic accuracy and tested on texture features of PET/CT images [14 -18].Based on these deliberations, this study has been attempted to compare several ML methods for lung cancer classification using texture and fractal features from lung PET/CT images.Few normal and lung cancer PET/CT images in various stages used in this study are depicted (Figure 1a and Figure 1b).

Machine learning techniques
Huge amount of data has been obtained due to the remarkable advancements in image acquisition devices and volume of images per single study.This may liable to human error and may have inter-observer variations in interpretations.ML techniques have made a remarkable progress in recent times and played a vital role in medical image analysis and assist medical experts for enable speedy, automatic diagnosis and with improved accuracy.Supervised learning algorithm develops a model based on training and makes predictions for new test data which are not in training set.These algorithms require a set of parameters tuned to optimal values that enable the model for best fitting of data and to perform in a best possible way.Various classical ML techniques were analyzed for comparative performance evaluation in lung cancer classification.
Decision Tree is a popular ML algorithm and graphical representation which makes decisions based on certain conditions.Optimal decisions can be arrived by traversing through forward and backward calculation paths.This method analyzed 3 models such as simple decision tree with maximum 4 splits, medium tree with maximum 20 splits and complex tree with maximum 100 splits.Medium tree with 20 splits with Gini's diversity index as split criterion helped in making coarse distinction between normal and cancerous regions of lung PET/CT images.Discriminant Analysis assigns objects to different classes based on the constructed discriminant equations.It predicts the class of a new set of inputs by maximizing the distance between the mean of each class and minimize the spreading within the class itself.Linear DA (LDA) is the preferred classification algorithm to classify linear data with more than two classes.Quadratic DA classifier is used to classify two or more classes by a quadratic surface.K-Nearest Neighbors is the most widely used nonparametric technique in classification.It is a simple algorithm in which entire training dataset and all possible cases are stored.A new case is being assigned to a class and prediction is made by its 'K' nearest neighbors measured by a similarity.Euclidean distance metric was used to establish the nearest neighbors.Performance and behavior of the algorithm depends upon the value of 'K' and its selection depends upon the data.This study tried with various odd values of 'K' ranging from 1 to 13 (square root of the training samples/2) and analyzed the performance.Naive Bayes Classifier is the most popular supervised learning method based on Bayes Theorem of Probability to predict each class.Naive Bayes classifier predicts with the assumption that all the features are unrelated to each other.NB algorithm is useful for a moderate to large training data set with several features.Neural Networks is a promising area, successfully applied in medical field for cancer detection and to identify pathological conditions.ANNs contain input, hidden and output layers and each neuron in hidden layer is fully connected to every neuron in input and output layer.Initially, ANN is trained with known set of input and outputs using back propagation or gradient descent learning algorithms.Back propagation learning algorithms are most widely used in which the error is calculated and back propagated to the previous layers to adjust the weights and biases so that the mean square error (MSE) is minimum.In constructing a model using ANN, the hyperparameters such as learning rate, momentum, number of hidden layers and number of hidden neurons have been tuned and optimal values were fixed based on the experimental results for MSE and accuracy.Learning rate of 0.3 and a momentum of 0.9 produced better performance with low MSE for most of the training functions.Performance goal of 0.0001 and 1000 epochs were set.The network was trained with 10, 15, 20, 25, 30 and 35 hidden neurons and observed the performance [19].It is observed that 20 hidden neurons produced minimum MSE for many training functions.Hence, 20 hidden neurons were used.SVMs now have become a popular tool in medical diagnosis [15,16].SVM is a supervised ML algorithm in which the dataset teaches SVM about the classes.New data is classified into various classes by finding a hyperplane using maximum margin rule.
The hyperparameters of SVM such as type of kernel and kernel width in RBF kernel have been investigated carefully to attain improved classification accuracy.Performance of SVM-RBF model is sensitive to the kernel width σ.Small width may cause over-fitting as the data points are close to hyperplane.Large width may end up with under-fitting, as the data points are very far from hyperplane.The optimal width is selected based on the accuracy.Performance of SVM with different kernels has been analyzed to identify a better classifier.Kernels are used in non-linear SVM to map the features into high dimensional space so that the linearly nonseparable data are linearly separable.SVM with RBF kernels is the most preferred classifier for complex, non-linear data as the training problem is convex.SVM hyperparameter tuning would help in achieving better classification accuracy.

Input features for ML methods
In this study, 14 second order texture features were calculated by using gray level co-occurrence matrix (GLCM).TA on lung PET/CT images resulted in identification of 3 significant texture features, which include auto correlation, sum average and sum variance [20].12 fractal features were calculated for PET/CT, which include Fractal dimension average (FD avg ) and lacunarity from fuzzy enhanced and modified images.Experimental study revealed that 10 significant fractal features were useful in lung cancer detection [21].All these features were extracted from enhanced PET/CT images using Wiener filtering followed by fuzzy enhancement.This study identified 13 significant features which include 3 texture and 10 fractal features extracted from PET/CT images.A total of 1072 samples which contain samples from normal, at cancer boundaries and cancerous regions in a ratio of 50:25:25 are used for training, validation and testing the classifiers, in which normal and diseased samples hold the ratio 55:45.
The sample data generated with the texture and fractal features from lung PET/CT images was sparse, diverse, multi-dimensional, linearly non-separable and slightly unbalanced

RESULTS
This study analyzed lung PET/CT images retrospectively to detect and classify the lung cancer using texture and fractal feature descriptors.MATLAB R2013a was used to implement this method and tested on lung PET/CT images of size 256 × 256.Noise in PET/CT images due to the artefacts caused by metallic implants, respiratory motion and application of contrast degrade the image quality and the diagnostic accuracy.These artefacts lead to the possibility of FP results due to more uptake of radioactive tracer.Therefore, efficient de-noising techniques are needed for PET/CT images in lung cancer diagnosis.
Wiener filter was employed in de-noising the additive noise present in PET/CT images with preserving sharp edges and textures.Adaptive fuzzy image enhancement was designed and developed to enhance the contrast between different scale regions; especially for poor contrast images.
Extracted combined significant texture and fractal features were used in classification.These observations were arranged as a matrix and fed as input to the supervised classifier.6 supervised classifiers including DT, DA, KNN, NB classifier, ANN and SVM with various kernels were tried and their performance was analyzed for finding the best classifier in classifying the lung cancer from PET/CT images.Ratio between training and testing dataset is 70:30.Performance measures such as sensitivity, specificity and accuracy obtained for all 6 classifiers were listed (Table 2).

Table 2. Comparison of performance measures of various classifiers analyzed in this method
Figure 2 shows the plot of sensitivity, specificity and accuracy for the supervised classifiers analyzed in this study.Performance of the LDA and QDA classifiers are unsatisfactory with the nonlinearly separable data.ANN resulted in an accuracy of 92.9%.DT, KNN and NB classifiers produced a closer accuracy of around 96%. Out of all 6 classifiers analyzed in this study, it is demonstrated from the results that SVM with RBF kernel, σ = 1.0 outperformed all classifiers.
RBF kernel width σ has an impact on the accuracy of SVM classification significantly, particularly in the interval [0.5, 1.5].SVM reasonably classified the data points whose distance from support vectors were equal to the width σ.Lower kernel widths such as 0.1 and 0.5 resulted in over fitting.In this case, the distance between data points and support vectors were greater than σ, thus produced lower accuracy.Higher kernel widths like 1.5 and 2.0 caused under fitting, as distance between data points and support vectors were smaller than σ.In this SVM-RBF model, σ = 1.0 produced maximum accuracy.RBF kernel of width σ = 1 lead to smooth, sharp decision boundaries and generalize strongly without much under and over fitting.ROC curve of SVM-RBF (σ = 1) classifier is shown (Figure 3).SVM Classifier produced a sensitivity of 98.8% and specificity of 97.9%.ROC curve is almost close to the gold standard of upper left corner, thus providing the highest accuracy.Overall accuracy attained is 98.10%, which is superior to the performance of various classifiers analyzed in this study.Increase in number of training samples unaltered the accuracy in SVM as SVM is least sensitive to sample size as it uses only support vectors to build the separating hyperplane.

DISCUSSION
Existing CAD systems in classifying lung cancer from PET/CT images utilized texture features and classical supervised classifiers ANN and SVM.These methods utilized thresholding for lung segmentation and extracted the first and second order texture features.All these methods used 10-fold cross validation for evaluating the performance of classifiers.The diagnostic accuracy of existing CAD systems established for lung cancer classification from PET/CT images were compared with the results of this study utilizing texture and fractal features from PET/CT images.The observations are listed in Table 3.  (Guo et al, 2015;Zhao et al, 2015) produced comparable accuracy of 92 and 95.6% respectively.Even though all these methods yielded a better accuracy, texture features might not be suitable for classifying small size lymph nodes (Gutte et al, 2007).Fuzzy enhancement designed for current study well enhanced the high textures with clear edges and facilitated to increase the diagnostic accuracy.Fractal features aided superior detection of cancerous regions and SVM is robust with strong generalization ability for moderate data using combined texture and fractal features for current study.It is clear from Table 3 that the developed CAD system by this method for lung cancer detection and classification from PET/CT images using SVM classifier with RBF kernel, σ = 1 achieved enriched accuracy of 98.10%.

CONCLUSION
Comparison of various ML techniques in lung cancer classification was envisioned in this study.CAD system was developed with texture and fractal descriptors from enhanced PET/CT images.Fuzzy image enhancement offered improved contrast for regions of high textures and found useful in lung cancer detection.Texture features based on statistical approach and fractal analysis were found highly useful for detecting and classifying distinct and subtle textures in dual-modality medical images.Significant features were selected from the extracted feature set based upon their relevance and contribution in lung cancer detection.This method identified 3 texture features and 10 fractal features as significant to detect and classify lung cancer from PET/CT images.

False Positive Rate True Positive Rate
The dataset obtained with these features were linearly non-separable.Several classical ML techniques such as DT, DA, NB, KNN, ANN and SVM have been considered for performance comparison.It is also observed that the classification accuracy produced by KNN and DT is comparable with the results by SVM.However, the specificity produced by SVM is higher than KNN and DT, thus yielding better classification accuracy.The RBF kernel used in SVM classifier mapped the linearly non-separable features well into linearly separable.
SVM classifier is robust with strong generalization ability for moderate data using combined texture and fractal features.SVM with RBF kernel, width σ = 1 yielded an accuracy of 98.10%; thus, concluding that SVM is the better classifier for lung cancer diagnosis compared to other classifiers analyzed in this method.

Figure 2 .
Figure 2. Performance comparison of various classifiers