Credit Scoring Using Ensemble of Various Classifiers on Reduced Feature Set

: Credit scoring methods are widely used for evaluating loan applications in financial and banking institutions. Credit score identifies if applicant customers belong to good risk applicant group or a bad risk applicant group. These decisions are based on the demographic data of the customers, overall business by the customer with bank, and loan payment history of the loan applicants. The advantages of using credit scoring models include reducing the cost of credit analysis, enabling faster credit decisions and diminishing possible risk. Many statistical and machine learning techniques such as Logistic Regression, Support Vector Machines, Neural Networks and Decision tree algorithms have been used independently and as hybrid credit scoring models. This paper proposes an ensemble based technique combining seven individual models to increase the classification accuracy. Feature selection has also been used for selecting important attributes for classification. Cross classification was conducted using three data partitions. German credit dataset having 1000 instances and 21 attributes is used in the present study. The results of the experiments revealed that the ensemble model yielded a very good accuracy when compared to individual models. In all three different partitions, the ensemble model was able to classify more than 80% of the loan customers as good creditors correctly. Also, for 70:30 partition there was a good impact of feature selection on the

accuracy of classifiers. The results were improved for almost all individual

Introduction
Classification is a process of assigning objects to one of several predefined categories. It is one of the most useful techniques in data mining to build classification models from an input data set. With classification, the generated model will be able to predict a class for a given instance depending on previously learned information from the historical data. Several studies used data mining for extracting rules and predicting certain behaviours in several areas of science, information technology, finance, education, biology and medicine. In past few decades, DM techniques have been widely used in the area of finance. Data were limited to the own databases of financial institutions but nowadays, some data are publicly available in several countries and financial institutions and researchers have developed many different quantitative credit scoring techniques (Sustersic et al., 2009). Credit scoring is a technique that helps lenders decide whether to grant credit to the applicants with respect to the applicants' characteristics such as age, income and marital status (Chen and Huang (2003)). The credit scoring models were initially proposed by (Fisher (1936)). These models were developed further in research done by (Altman (1968)), (Beaver (1967)) and others till now.
Hybrid data mining approaches such as GA-ANN, PCA-ANN and LR-ANN (Sustersic (2009)), MARS-ANN (Lee and Chen (2005)), LR-ANN (Lin (2009)) and GA-SVM , CART+MARS, SVM (Chen et al., 2009) and others has resulted in better performance of model. A novel machine learning technique called Ensemble learning is also being used for improving accuracy. A classifier ensemble (also referred to as committee of learners, mixture of experts, multiple classifier system) consists of a set of individually trained classifiers (base classifiers) whose decisions are combined in some way, typically by weighted or un-weighted voting, when classifying new examples (Kittler et al., 1998), (Kuncheva (2004)). It has been found that in most cases the ensembles produce more accurate predictions than the base classifiers (Dietterich (1997)). Researchers have shown that aggregating approach can easily achieve improved accuracies by an aggregation of individual classifiers for credit scoring as well as the classification application. (Hoffmann et al., 2002) reported that the boosted genetic fuzzy classifier performed better than both the neuro fuzzy classifier and C4.5 algorithm. (West (2005)) reported that ensemble model of NNs obtained the higher accuracy than the single NN in credit scoring and bankruptcy prediction. This paper presents an ensemble based credit scoring model for consumer loans. It is a feature selection based ensemble classifier in which an ensemble of 7 base classifiers is developed to form a classifier using a confidence-weighted voting method for enhancing the classification accuracy of the individual models. The performance of ensemble classifier is evaluated using the German credit dataset from the UCI Machine Learning Repository.
In the first section, an introduction to classification and credit scoring systems is given followed by the feature selection approach and ensemble approach. The second section describes the methodology of research having dataset description, model selection and model building. In the next section, model testing is done and the results are given. The results are discussed in the discussion section followed by the conclusion.

Ensemble Approach
There are three ways of classifying the training/ test instances into one of the predefined categories, they are: (1) individual, (2) hybrid, and (3) ensemble based approaches. Individual approach involves using a single statistical or machine learning method for classification. Ensemble approach weighs several individual classifiers, and combines them in order to obtain a classifier that outperforms every one of them. The important difference between hybrid methods and ensemble methods is that hybrid methods only use one classifier for sample learning and employ different way in feature selection and classifying stages, while ensemble learning produces various classifiers with different types or parameters, such as various SVM classifiers with different parameters, and train different samples for many times (Li and Zhong (2012)).
Ensemble learning has become the latest method of credit evaluation modelling. (Paleologo et al., 2010) proposed a hybrid credit evolution model based on Kmeans, SVM, decision trees and ada-boost algorithms and classify the samples by subagging ensemble approach. (Yu et al., 2008) employed ANN classifiers with different structures and used maximizing correlation to choose the ensemble members. (Nanni and Lumini (2009)) used random subspace ensemble approach. In this study, an ensemble approach based on confidence-weighted voting has been employed on 7 classifiers for classifying the credit dataset and enhancing the classification accuracy.

Feature Selection
It is a pre-processing technique that identifies a subset of input variables by eliminating features with little or no predictive information. Feature selection can significantly improve the comprehensibility of the resulting classifier models and often build a model that generalizes better to unseen points. Several feature selection methods are available with different search techniques to produce a reduced data set. This reduced data set improves accuracy compared with original dataset without altering the relevance or meaning of the data set.
The relationship between a feature selection algorithm (FSA) and the inducer chosen to evaluate the usefulness of the feature selection process can take three main forms: (1) Embedded, (2) Filter and (3) Wrapper. In scheme (1) the inducer has its own FSA (either explicit or implicit). The traditional machine learning tools like decision trees or artificial neural networks are included (Mitchell, 1982) in embedded scheme. In scheme (2), the feature selection process takes place before the induction step, and the former can be seen as a filter of non-useful features prior to induction. In a general sense it can be seen as a particular case of the embedded scheme in which feature selection is used as a pre-processing. The filter schemes are independent of the induction algorithm. In scheme (3), the relationship is taken the other way around: it is the FSA that uses the learning algorithm as a subroutine (John et al., 1994). It employs a search through the space of feature subsets using the estimated accuracy from an induction algorithm as the measure of goodness for a particular feature subset. From a set of hundreds or even thousands of predictors, the feature selection screens, ranks, and selects the predictors that are most important. The predictors, which contribute less in prediction, can be skipped from the data set. It provides a quicker, efficient model that uses fewer predictors, executes more quickly, and is easier to understand.
Present study considers the feature selection process based on Chi-Square statistic for identifying the important ones out of all predictor variables. Chi-Square statistic measures the lack of independence between a variable value and the class value. The Chi-Square model proposed by Karl Pearson has been used in the study. The features having the highest Chi-Square values for a particular class would prove to be the best in classification of the instances of that particular class. Thus, the first desired numbers of features, which have the highest Chi-Square values are selected.

Methodology of Research
It is an empirical research paper. The cross classification was conducted on 3 partitions and then the feature selection process was applied based on Chisquare measure. The unimportant features were skipped and the performances on reduced set were compared against the performances of the classifiers using all features. The study was conducted in two steps, first taking all features as input and second taking only the reduced set. The Clementine tool was used to build and compare a number of different models for classifying the loan applicants into good and bad credit categories.

Objectives of Study:
The objective of this study is to propose an ensemble classifier using feature selection along with cross classification for credit evaluation with the purpose of enhancing the classification accuracy of the individual models.
Research question: Can a combined effort of feature selection, cross classification and ensemble approach improve the classification accuracy of credit scoring models?

Data Set Used:
The German credit scoring dataset of 1000 instances was taken from the UCI Machine Learning Repository. This dataset consists of 700 instances of creditworthy applicants and 300 instances of customers who should not have been granted the credit. In addition, it presents twenty (20)

Cross Classification/ Data Partitioning:
The partitions of dataset were done in training and test data in ratios of 60:40, 70:30, and 50:50. The models were separately trained on 60%, 70% and 50% data and then tested on 40%, 30% and 50% data respectively. Pre-processing of dataset was done for identifying the missing values and the outliers.

Model Selection
Initially, 10 classifiers were taken to train on the credit dataset. These were: Neural Networks, C5.1, C&R Tree, QUEST, CHAID, Logistic Regression, Decision List, Bayes Net, Discriminant Analysis and SVM. A set of candidate models have been generated and ranked. The models were ranked based on the overall accuracy of the models. The seven classifiers showing good training performance-the NNs, C5.1, CART Tree, QUEST, CHAID, LR and SVM (Fig. 1) were chosen as the base classifiers for the ensemble based experiments.

Model Building
First the dataset was partitioned into Training and Test instances. Then feature selection was applied and the important variables were selected. All the important fields were selected in the model based on the p-value (importance) for predictors using Chi-square measure. Out of 20 predictors 14 were ranked as important by the feature selection algorithm. These were: checking_status, duration, credit_history, purpose, credit_amount, savings_status, employment, instalment_ commitment, personal_status, property_magnitude, age, other_payment_plans existing_credits, and residence_since.
All the models were trained on the training instances of the data set with and without applying the feature selection. Then a single model was generated for each of the 7 selected classifiers. The seven generated models were combined using a combining method for ensemble models. The individual base models were combined using the confidence-weighted voting method, which determines how a single aggregated score is produced for each record. With simple voting, if two out of three models predict yes, then yes wins by a vote of 2 to 1. But in case of confidence weighted voting, the votes are weighted based on the confidence value for each prediction. If one model predicts a 'no' with a higher confidence than the two 'yes' predictions combined, then the 'no' wins.

Results
The trained models were first tested on the trained and test set before applying the feature selection. The results of each individual model and the ensemble model without using feature selection are shown in Table 1. using cross classification on three different data partitions. The performance on the test dataset is less than that of training dataset as all the models were already trained on the training set. For the test set, the classification was performed on the instances whose class labels were known but not presented to the model. The performance of the ensemble model was best followed by the LR model in all partitions. The performance of SVM, was also comparable to that of LR and better than NN, CART, C5.  The performance on the test dataset is less than that of training dataset in all cases. For the test set, the classification was performed on the instances whose class labels were known but not presented to the model.  (2008)) employed GA as the feature selector to facilitate the ensemble classifier to improve the overall sample classification accuracy while also identifying the most important features in the dataset of interest. The results suggest that this GA-Ensemble method outperformed other algorithms in comparison, and proved to be a useful method for classification and feature selection problems.
customers as good creditors correctly. No enhancement in results was observed using the feature selection technique on the 60:40 and 50:50 partitions; instead the classification accuracy was decreased for the individual as well as the ensemble models. But using the 70: 30 partition there was good impact of feature selection. The results were improved for almost all individual models including the ensemble model on 70:30 partition.

Conclusions
This paper proposed an ensemble based technique combining selected base models to increase the classification accuracy. Feature selection has also been used for selecting important attributes for classification. Cross classification was conducted using three data partitions. From the results of the experiments it is concluded that the accuracy of ensemble based classifier is more than all single base classifiers on all partitions. It is also concluded that LR and SVM performed better than all other base classifiers individually. The impact of chi-square based feature selection method was good on only 70:30 partition, so other feature selection methods including wrapper based methods can be tested for getting better results. Also this methodology can be tested on other real time credit datasets using different combinations of the base classifiers.