Surface Roughness Prediction of Machined Components Using Gray Level Co-occurrence Matrix and Bagging Tree

Surface Roughness of a machined component is crucial in identifying its functional capability when the manufactured specimen has metal to metal contact during operating condition since most wear and tear of the parts occurs due to friction between the surfaces of the moving parts. It is quite difficult to manually check the surface roughness of each component being manufactured on a manufacturing line. This paper aims to present a methodology to predict surface roughness using Image Processing, Computer Vision, and Machine Learning. Two machine learning algorithms Bagging Tree and Stochastic Gradient Boosting are compared and evaluated based on statistical parameters .It is observed that Stochastic Gradient Boosting predicts surface roughness in an efficient way both for training and Ten-fold crossvalidation. The methodology used can be employed for online inspection and qualitative assessment of machined components.


INTRODUCTION
Surface roughness is an important parameter that quantifies the quality and precision of the manufactured workpiece [1]. It is measured after calculating the deviations towards the normal vector from an ideal surface to the actual workpiece surface. Roughness measurement is crucial since the irregularities present may lead to the formation of crack and pits due to the interaction of the workpiece surface during operation. It is observed that rough surface wears more quickly and consists of a high coefficient of friction than the smooth surfaces. Tool geometry, the material of workpiece, and machining parameters are the important factors that affect the surface roughness in manufacturing processes. To achieve a better surface finish of the workpiece, the combination of optimal machining parameters such as speed, feed, and depth of cut needs to be selected. Design of experiment (DOE) is a statistical method that begins with the formulation of an orthogonal array, which can be either full or fractional factorial so as to confirm that all the variations in levels of parameters are considered in an optimized way. Taguchi orthogonal array is a method used in DOE in which the factors can be evaluated independently of each other despite the fractionality of the design. Various methods have been proposed by the authors for the evaluation of the roughness, which ranges from the qualitative assessments based on visual and hepatic perception to the quantitative instrumental measurement methods.
Many researchers proposed different approaches to predict surface roughness based on machining theory [2][3][4][5][6][7]. Surface measurement is mainly divided into two categories; (1) contact measurement, (2) non-contact measurement [8][9][10]. Contact type measurements are used owing to a compact design, high measuring accuracy, and ability to give consistent output for the surface being examined [11]. However, the study conducted by [12] highlighted the advantages of measurement accuracy [12]. Numerous kinds of research are going on non-contact type assessment of surface roughness parameters using machine vision systems and artificial intelligence technology, which include methods like laser speckle, light scattering, and optical interference [13][14][15][16][17]. Many authors have done regression modeling to develop a surface roughness prediction model for the workpiece machined through various manufacturing processes. Significant parameters and optimum levels determined, which can be utilized for achieving high-quality surface [18]. In regression modeling, a mathematical model is developed between a target value and the input parameters. The number of target value is generally one and, there is a relationship between the independent and dependent variables. Due to complex geometry and the high precision requirement of a workpiece, the surface roughness prediction model is nonlinear in nature. Further nonlinear statistical models are not adaptive, i.e., they can not learn from the historical data and maps input and output accordingly. Machine learning regression modeling emerges as an effective tool to adaptively map the input and target value while learning continuously from the historical data. Machine learning methods such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Naïve Bayes, etc. were utilized to estimate the roughness of machined surface when the features extracted were fed as an input to the algorithm and training and testing of feature set was done.
Shandilya et al. [19] predicted the speed of average cutting in wire cut electric discharge machining for SiCp/6061 aluminum matrix composite by comparing two different models, Artificial Neural Network (ANN) and Response Surface Model (RSM). The experimental study demonstrates that prediction is more accurate using the ANN model compared to RSM. Al-Zubaidi et al. [20] compared a statistical model with the ANN model to predict roughness of Ti-6Al-4V ELI titanium alloy machined surface and concluded that through ANN modeling, one could obtain the better coefficient of correlation and less error of mean square. Bandapalli et al.
[21] discussed surface roughness measurement of titanium alloy grade five in high-speed micro end milling using various methods like Multiple Regression Analysis (MRA), Group Method Data Handling (GMDH) and ANN. Li et al. [22] utilized the temperature and vibration data collected from multiple sensors to predict 3D printed component's surface roughness. Better accuracy of surface roughness measurement was achieved by training prediction models using ensemble learning algorithms.
A non-contact type instrument generally uses light in place of the stylus in contact-type measurement. Noncontact optical surface roughness measurement methods include ultrasonic methods, Scanning Tunnelling Microscopy (STM), and techniques based on machine vision [23]. Machine vision, particularly in manufacturing, is significant due to its non-contact type nature, flexibility, efficient characterization of surfaces, and minimum human intervention. High-resolution cameras are used to capture images of machined surfaces, and then it is processed using image processing techniques to extract useful information. A machine vision system is used in control of surface quality, measurement of tool wear, texture characterization, roughness measurement, determination of the severity of wears, etc. [24]. From the last few decades, analysis of surface texture of machined components is an active field of research as it has a significant role in the area of manufacturing and computer vision based on image processing. Jeyapoovan and Murugan [25] applied Euclidean and Hamming distances of milled specimen surface images captured from CCD camera. Classification of images and prediction of surface roughness was done by the establishment of a database based on capture signals of the reference images with the known values of surface roughness. A statistical method for roughness prediction based on the color distribution matrix was proposed by Liu et al. [26]. Results obtained through experiments on color images suggest that high accuracy can be achieved using a method based on overlap degree compared to other methods. Texture information extracted from grayscale images is considered as a suitable method to evaluate the roughness in the specimen. Gray Level cooccurrence matrix (GLCM) is a type of spatial domainbased texture feature extraction algorithms in which geometric features are extracted from the histogram and gray-level variance [27]. Machining processes like turning left the strong pattern on the machined surface;in this case, geometric features have been extracted, which carries better information. However, when the geometric features are extracted from the specimen processed through grinding, milling and lapping possess randomness. Therefore, GLCM alone is not suitable for feature extraction. For denoising, the images need to be pre-processed through filters to minimize noise and randomness.
The present study proposes a methodology to predict the surface roughness based on GLCM, Weiner filter, and machine learning models. An experimental study was conducted on milled surfaces generated with variations in machining parameters like feed, speed, and depth of cut. A DOE based orthogonal array considered for carrying out the experiment and the milled specimens, roughness values measured with the profilometer. A Charge-Coupled Device (CCD) camera has been utilized in order to capture images, and then the images are pre-processed through wiener filter, and GLCM is used to extract the geometrical features. For surface roughness prediction, two machine learning regression models, bagging and Stochastic Gradient Boosting, have been applied so as to adapt to the non-linear nature of the dataset. The methodology used to predict surface roughness is shown in Figure 1.

SURFACE ROUGHNESS PARAMETERS
In the present study, amplitude parameters have been used as they measure the vertical deviations of the machined surface, which is a very crucial part of the roughness measurement. Arithmetic Average Height (R a ) and Ten Point Height (R z ), parameters reflect the most important characteristics of surface topography, which are widely used by manufacturing industries. It is observed that these two parameters set the benchmark to validate the surface roughness of any given machined components.

Arithmetic Average Height (R a )
The arithmetic average height (R a ) is being used globally to check the quality of machined surfaces. R a is mathematically defined as the arithmetical mean deviation of the profile that is being assessed for a single sampling length. This parameter satisfactorily captures the general trend of the deviations present on the surface. The mathematical formula of the arithmetic average height (R a ) is as follows:

Ten-Point Height (R z )
In comparison to R a , R z is more sensitive to the variations in the surface topography. According to ISO standard R z is mathematically defined as a difference between the average of the five highest peaks and lowest valleys along the sampling length of the machined surface. The general formula for n-point height is given by: where V i is the depth of valley i with reference to centerline, P i is the height of peak i with reference to centerline, and n is the number of peaks and valley.

Design of Experiment and Orthogonal Array
The experiment was conducted on a CNC milling machine. The specimens were prepared from a cylindrical bar with 204 mm length and 30 mm diameter AISI 1040 steel, which normalized to 900 0 C and hardened to 35 HRC. Design of Experiment (DOE) study conducted using orthogonal array and by considering operating parameters like cutting speed, feed, and depth of cut at different levels in order to conduct the experiments. DOE is a statistical method to conduct experiments which ensures that all the machining parameters and their interactions will be systematically investigated to avoid redundancy and biasness, which may occur during the conduction of experiments. The Latin Square Test Matrix is used as the design array, as the number of levels available in the present study is limited to five levels per parameter. With the Latin Square Test Matrix total of twenty-five, experimental runs have been performed. As shown in Figure 2, surface roughness was measured with surface roughness tester Handysurf 35A/B after each operation. Table 1 shows the measuring condition of the surface profilometer. Table 2 shows the level of operating parameters, and Table 3 shows the DOE table for the conduction of the experiment.

Machine Vision and Preprocessing
As shown in Figure 3, the machine vision system consists of a charge-coupled device camera with a high resolution and megapixel lens connected with a high end industrial central processor unit. The specimen was placed perpendicular to the worktable axis, and the angle between the normal light source and table was kept at 45 0 while the optical axis of the camera kept perpendicular to the workpiece surface.

Figure 4: Sample Images Captured by CCD Camera
While capturing the image of the machined surface, the images can be degraded due to the noises present in the vicinity of the image grabbing due to various reasons like improper lighting, presence of reflective surfaces in the facility, etc. Wiener filter is an adaptive and computationally simple algorithm and produces better results in comparison to linear noise filtering algorithms. When it comes to high-frequency parts of the images and preserving edges of it, wiener filter's adaptive nature becomes more selective, and it also gives the best results with the noise constant power ("white") additive noise. The main characteristics of the Wiener filter can be pointed out as (a) With spectral characteristics, it is assumed that the noise and signal are known and stationary linear random processes. (b) It is a requirement for the filter to be reliable; however, neglecting this requirement resulted in a non-causal solution. (c) Performance should end up with a minimum mean square error. Figure 4 shows sample images captured with a CCD camera while filtered images shown in Figure 5.

Feature Extraction Using GLCM
Gray Level Co-occurrence Matrix (GLCM) algorithm is utilized for texture analysis by extracting texture features from captured images of machined components. It is one of the best second-order statistical methods used for image analysis. GLCM algorithm exploits the gray level distribution among the pixels of a filtered image in order to extract texture features such as correlation, contrast, entropy, cluster prominence, etc. GLCM considers relation between adjacent pairs of pixels in one offset as a second-order texture, where the first pixel is being considered as a reference and the other as a neighbouring pixel. Between pixel pair, GLCM creates the two-dimensional matrix madeup by joint probabilities of P d and θ (i, j) separated by a distance d in a θ direction. In the present study θ was kept zero. Various features are calculated as:

MACHINE LEARNING TECHNIQUES
The surface texture features extracted using GLCM algorithm are used to map the surface roughness values for different specimens by creating a feature vector. This feature vector utilized as an input in various machine learning models to perform a regression analysis for the given data set to check the performance level of the machine learning models. The machine learning models used in this study are: 1. Stochastic Gradient Boosting 2. Bagging (Bootstrap Aggregation) Trees Algorithm

Stochastic Gradient Boosting
Gradient Boosting is a group of weak prediction models. Extension of gradient boosting by minor modification resulted in the Stochastic Gradient Boosting algorithm. In each iteration, a base learner is a fit subset of the training data set selected randomly without any replacement. The size of the subset is a fixed fraction of the training data set. Smaller values of these fractions introduce randomness in the algorithm and thus prevent the algorithm from overfitting and thus acts as a regularizer. This also enhances the speed of the algorithm as the regression trees have to be fit to a smaller dataset after each iteration.

Bagging Trees Algorithm
Bagging (Bootstrap Aggregation) Trees is a subtype of decision trees algorithm, which is a supervised learning technique. It employs the ensemble learning method that combines a random set of decision trees having varying depths, which helps to reduce the variance inherent in a single decision tree. Several subsets of the training data are made, and each subset is used to train the individual decision trees randomly. This ensemble method makes the model more robust when compared to a single decision tree model.

RESULTS AND DISCUSSION
In the present study, in order to predict the roughness value training and tenfold cross-validation was done using stochastic gradient boosting and bagged trees. During tenfold cross-validation, ten iterations were performed by dividing the dataset into ten equal size folds. In order to check the performance of the models, 25 instances and 21 features are used. While predicting the 'R a ' values, the performance of Bagging Trees algorithm is assessed by the coefficient of correlation and root mean square error for training and tenfold cross-validation and their values are 0.9587 and 0.9289 for the coefficient of correlation and 0.2111 and 0.3238 for root mean square error for training and tenfold cross-validation respectively. As shown in Table 4 while predicting the 'R a ' values, the performance of Stochastic Gradient Boosting algorithm is assessed by the coefficient of correlation and root mean square error for training and tenfold cross-validation and their values are 0.9985 and 0.8964 for the coefficient of correlation and 0.0421 and 0.3498 for root mean square error for training and tenfold cross-validation respectively. As shown in Table 5 while predicting the 'R z ' values, the performance of Bagging Trees algorithm is assessed by the coefficient of correlation and root mean square error for training and tenfold cross-validation and their values are 0.9221 and 0.8305 for the coefficient of correlation and 1.4138 and 2.1474 for root mean square error for training and tenfold cross-validation respectively. While predicting the 'R z ' values, the performance of Stochastic Gradient Boosting algorithm is assessed by the coefficient of correlation and root mean square error for training and tenfold cross-vali-dation and their values are 0.9977 and 0.9206 for the coefficient of correlation and 0.2515 and 1.5331 for root mean square error for training and tenfold cross-validation respectively. The results achieved by both the algorithms are found to be satisfactory since errors are in permissible limits.