Application of Data Mining in Direct Marketing in Banking Sector

The key of a successful business operation lies in a good communication with clients, and therefore companies are paying more attention to Customer Relationship Management. One strategy of the CRM is to analyze and understand the consumer’s behaviour and characteristics, and to reach the necessary answers based on the implementation of the direct marketing campaign. The aim of this study is to identify factors that would indicate the customers who are ready to submit long-term deposit to the bank. The obtained results will distinguish the group of clients who are satisfied with the bank's operations and are ready to participate in marketing campaigns. Having compared the methods used in the study, the method of classification has proved to be more reliable than others. This analysis obtains results through the use of data mining algorithm known as decision trees. The disadvantage of this method is inaccurate data supplied by the client.


Introduction
Companies are using new technology such as data mining and data warehousing, to gain competitive advantage on the market."Data mining" is defined as a sophisticated data search capability that uses statistical algorithms to discover patterns and correlations in data (Newton H., 2002).It discovers customer's needs in the enormous amount of dates with objective to retain potential customers and maximise customer value.Data mining plays a crucial role in the CRM, and thus companies can gain definition of the customer segment.One out of six types of data mining models is being used for solving business problems and achieving results in campaigns.For example: classification, regression, time series, clustering, association analysis, and sequence discovery.In the paper, it will interpret two models, classification and clustering.Classification and regression are used to make predictions, while clustering can be used for forecasting or description.
Application of data mining in CRM (Customer Relationship Management) is a trend in banking sector, but the things changed when the crisis started.The financial crisis initially manifested in financial, i.e. banking sector, and later spilled over into the entire global economy.The crisis was additionally marked by withdrawal of the citizens' foreign currency savings, as well as by significant inflationary risks as of the crisis outbreak.In recent years, banks have conducted radical changes in order to recover.In the short term, they managed to regain the lost trust and to show high level of capability to mobilize the deposit potentials.In the financial market, the primary goal is perseverance of market position and gaining the loyalty of the present clients.Various marketing strategies can help obtaining useful and necessary information for maintaining communication between the bank and the client.The companies are today focused on high costs and competitive pressures lurking form the region.Satisfaction of the existing clients is of crucial importance, thus great attention is paid to Customer Relationship Management (CRM).This field focuses on identification, creation and maintenance of permanent relationship with clients.The best clients deserve the best treatment.If company treats best clients as any other client, they will in return treat the company in the same way.Customer Relationship Management is a leading strategy upon making marketing decisions (Reinartz and Kumar, 2002;Torkzadeh et al., 2006).The knowledge of the consumer behavior is very important for Client Relationship Management (Coussement et al., 2009).Organizations realize that their existing client from the database is their most valuable assets (Athanassopoulos, 2010).Quest for potential clients takes time and money.If we compare the investment in mass and direct marketing campaigns, it is assumed to be 12 times more expensive to obtain a new client than to retain the existing one (Torkzadeh et al., 2006).Moreover, loyal clients will generate higher revenues and margins than new clients (Reichheld and Sasser, 1990).Gustafsson et al., (2005) made the study on telecommunications services for testing client satisfaction and the decisions taken in order to retain clients.The results showed the need for CRM, i.e. managers needed to precisely determine the client satisfaction in order to reduce the loss of clients.
In the financial sector, there are a growing number of financial institutions on the market that offer favourable interest rates aimed at attracting clients.However, it does not necessarily mean that the conditions offered by bank to its clients influence the clients' decisions to invest their deposit.Economic pressure and market competition resulted in a situation in which financial institutions increasingly invest in marketing campaigns.There are two basic approaches through which banks promote their services and these are massive campaign, targeting the general public, or direct campaign, targeting a specific set of contacts (Ling and Li 1998).Direct marketing campaigns are aimed at selling their services by contacting certain number of clients due to budgetary constraints.Mass marketing uses mass media such as TV, radio, magazines and newspapers to broadcast information directed to current and potential clients.Direct marketing is an interactive marketing system that uses a variety of channels for targeting potential clients, without the use of media.The success of a direct marketing campaign depends on supply, elements of communication, timing and choice of clients.These four factors mentioned are motivating for the research in the field of marketing.Rao and Steckel (1995) proposed a conceptual framework that included clients.The main activities of clients are whether to participate in a marketing campaign.These activities are updated in the database and influence the decision made in the field of marketing.The most popular form of communication is direct mail, but other forms are also used, such as telemarketing, distribution by faxing or coupons.In direct marketing, clients are usually asked to answer to specific questions.
Answers are measurable and their activities are usually stored in the database.By using the answers from interviewed clients in the campaign, we can assess the expected number of answers or overall response rate, and can use such information in making important management decisions.By observing the results obtained, decision maker should adjust the strategy and should thus conduct the following marketing activities.
The use of direct marketing has increased lately.In their research, Barwise and Farley (2005) came to a conclusion that in some European countries direct marketing expenditures increased in the period 2001-2004.According to the report published by the Direct Marketing Association (Johnson and Frankel, 2005), total expenditures for direct marketing advertising in 2005 in the U.S. amounted to about $ 161.3 billion.Direct marketing activities in 2005 accounted for 10.3% of GDP in the U.S. By considering the direct marketing profit it is obvious that advertising profitability is rapidly growing.We can say that direct marketing is the field that has rapidly developed lately and reported huge interest among researchers, which also presents the reason for our research.
However, in order to make the research effective, the key question is how to choose a particular group of clients for carrying out the campaign and which techniques to use for the selection of targets?The selection of target clients is core activity of direct marketing.The aim may be to choose an individual or a certain group (Bult, 1993;Bult et al., 1997).There are different quantitative methods, including the problem of the assessment of client's answers from the system perspective.Many clients may consider the contact as a nuisance and may interpret the calls as intrusive and annoying.Therefore, companies most often choose the clients that have been current users or have participated in current marketing campaigns.
Numerous models are used in marketing campaigns, such as the model for assessing clients' answers.The accuracy rate and ROC (Receiver operating characteristic) curve are used to access the performance of a classifier.However, most commonly used indicator for assessing the forecast is lift ratio.Lift ratio can be efficiently used as a tool by which marketing managers can make the decision on the number of contacts to be used for training (from the original set) and to check if there is an alternatively better model for some answers (Cortez et al., 2010).
Given the previous research, we can see in Table 1 the methods used in the field of direct marketing in the several previous years.This research is based on data mining in order to discover laws and making of management decisions in a big data stream. 1.
The first part of this paper presents short view of the decision tree, neural networks and the support vector machine.

2.
The following sections present the DPP (data pre-processing) analysis in order to identify its importance for accuracy of the projection.

3.
The influence of various DPP techniques is considered as regards the performance of the decision tree, as well as of neural networks and of the support vector machines.

Choachang Chui (2002)
A cased based client classification approach for direct marketing, Expert System with applications, vol.22, 163-168 The aim of the research is to identify the model for predicting the clients' shopping behaviour.The paper describes Genetic algorithm, based on the approach of the increased adjustment process.The following sections present the comparison of the two models, where GA-CBR shows better performance compared to the regression model.
papers in the field of banking marketing that will be the subject of further research (Ling and Li, 1998;Hu 2005;Li et al., 2010).
Data mining goal is to make the selection of clients, most probably according to the campaign and to envisage the likelihood of clients' responses.In this research we apply the data mining approach in the banking sector marketing campaign.The aim of this paper is to identify the main factors which influence the clients to subscribe the long-term deposits in the bank.
The paper is organized as follows; It begins with a brief review of background on the concepts of direct marketing and data mining, followed by methodological approach and the available data.The third part of the paper refers to analysis of the decision tree, comparison between two methods: classification and clustering.Finally, the paper provides interpretation of the results and the conclusion.

Methodology
This paper addresses the research in the field of direct marketing.The technique used for the selection of the results is the decision tree model, which shows data mining based on the client's response.Through development of the decision-tree model the crucial input variables are defined as well as the clients' future decisions.The good side of the decision tree is that the credit analyst independently sets up the decision rules and classification rules.The classification model is the most frequent used model for predicting future customer's behaviour in marketing campaigns.If we compare the classification and clustering, we will interpret and identify strengths and weaknesses of the target results.
For the research purpose, the database of the Bank of Portugal was used with the written permission of the authors Paul Cortez, Sergio Moro and Raul M. S. Laureano, where the information was collected from the marketing campaigns conducted in the period May 2008 -November 2010.Clients were contacted by telephone and offered an attractive interest rate for the long-term form of savings.The database was made of a total of 45,221 instances, of 16 input variables that present the data on clients, as well as of 2 output variables (two classes).If the client confirms over the phone that he/she will deposit the funds with the bank that contacted him/her, that class is to be marked with YES, otherwise the value NO is to be added.The Figure 1 provides a graphical representation of the bank's database.Figure 1 clearly shows that clients are segmented by 16 attributes: age, job description, marital status, education, whether the client has unduly serviced loan with the bank that carried out the campaign, the amount of client's annual income, expressed in EURO, whether the client owns the property, whether the client has a loan on his/her name, the way client was contacted, last contact in the month during marketing campaign, the month in which the client was contacted, duration of the last contact, expressed in seconds, how many times the client was contacted during the campaign, how many days have passed since the client was contacted in the previous campaign, how many times the client was contacted in the previous campaign, outcome of the previous marketing campaign.It can be seen in the table that the input variables can be grouped into: information relating to the individual and information relating to the marketing campaign.In this paper we have analyzed the group of clients we selected from the database, since those clients were previously included in the marketing campaigns and have already established cooperation with the bank that conducted the campaign.
By using two methods we aim at finding the best model that achieves high performance predictions.

Interpretation of the Results and Summary for Further Research
In this section, we explain results received from classification and clustering models.

Classification
Classification has revealed significant issues in each area of direct marketing research, thus leaving room for further research.Classification for direct marketing purposes is frequently used technique for decision-making in marketing field (Kaefera F. et al., 2005).Specifically, our study examines the decision of clients in the banking sector, in order to obtain feedback from managers.If the client decides to place time deposits, we will mark the class with YES, otherwise with NO.
By means of building a model of classification, we can see in the figure below that the first node presents the attribute "duration" defining the duration of the last contact, expressed in seconds.If the conversation lasted more than 410,5 seconds, then 7.543 clients are taken into account.Out of 7,543 examinees, 3,192 confirmed they would place the deposit with the bank, while 4,351 clients said otherwise.If the conversation lasted less than 410,5 seconds, then 37,668 clients are taken into account.The second level of the decision tree refers to the attribute "outcome", representing the outcome of the previous marketing campaign.In this case, the most answers refer to "unknown" outcome of the campaign.Out of a total of 37,668 clients, whose contact lasted less than 410,5 second, the outcome of the previous campaign cannot be confirmed for 30,787 clients, while 1,175 clients supported the previous campaign.Figure 1 provides a hierarchical presentation of a decision tree.

Figure 1. Decision tree Source: Author
Further branching of the decision tree considers the attribute referring to the previous marketing campaign, i.e. the month in which the client was contacted for the last time.
The above Figure 1 shows that the greatest attention is paid to the marketing campaign, and least to client's personal information.Based on the outcome of the previous campaign, we identify the group of clients ready for cooperation, with an aim to make them consider our offer.In this way, the clients make it easy for themselves by obtaining information over the telephone thus saving their time.Therefore, if the client expressed interest in his/her conversation with the agent, the possibility that he/she will deposit funds with the bank is higher.
Table 3 provides a graphical representation of the accuracy of our predictions based on the relevant indicators.Cross-validation method is used to present the model validity.The algorithm is tested on one part, and trained on others, by repeating this method on all parts.Further in this paper we shall interpret the Table 3.The first column shows the accuracy of classification.Our research contains 88.5% of correctly classified cases, which represents the ratio between the projected and actual cases.
The aim is to make this value as large as possible, which in our case does not represent a satisfactory amount.The second column refers to sensitivity and shows the percentage of precise, positively projected cases relative to the actual number of positive cases, which in our example stands at 93.6%.The specificity against sensitivity indicates a percentage of the negatively projected cases relative to the actual number of negative cases, which stands at 50.12%.A part below the ROC (Receiver operating characteristic) curve is used for ROC analysis (the curve is developed in the coordinate system in which the ordinate represents sensitivity and the abscissa shows specificity, i.e. the false positive rate).The aim is to make this value as high as possible, since it would mean that ROC curve is shifted more toward the upper left angle, representing thus greater sensitivity.In our case, it stands at 70.5%.Brier score represents an average deviation from the confirmed and actual probability of the scenario.It is preferable to keep this value as low as possible.In this example, it stands at 20.4%.

Clustering
Market segmentation is used to target smaller markets and is useful to decision makers.Although clustering algorithms are applied for solving the aforementioned problem, their application is not valid until the irrelevant variables are removed.Irrelevant variables distort the clustering structure and the results are then useless (Hsiang-Hsi L. and Chorng-Shyong O., 2008).
Cluster analysis is a technique used to divide a set of objects into k groups so that each group is homogeneous, with certain characteristics based on similarities or differences.
The purpose of clustering is to make marketing segmentation (Athanassopoulos, 2010).In our research, attention is focused on the client segmentation in the banking system.Through valid classification of clients into certain clusters, marketing managers implement certain strategies.
However, the groups are not pre-defined, and the grouping is performed based on the similarities found (Sarčević M. et al., 2010).This algorithm works best when the input data are mainly numeric.Although there are many algorithms for data clustering, we can never be certain which one is the best for a certain database.By this approach, we shall identify several segments that managers use to make decisions in the coming marketing campaigns.
This study uses the clustering method for grouping clients according to the input variables.The optimal number of clusters is 12. Figure 4 shows segmentation of several instances from the database.Client segmentation is widely used in the business environment.Market segmentation involves grouping clients into several segments in order to facilitate the adoption of marketing decisions.The attributes contained in the database describe clients and marketing campaign, which is equally important for making decisions on whether the client will deposit his/her funds with the bank conducting the research.
In our future paper, we intend to collect more information for the database, in order to test the accuracy of the predictive models, which are used for making marketing decisions.We also plan to apply the data mining model to a real environment, with a stronger interaction with marketing managers, and with receiving feedback.While the earlier objections were directed to the fact that the technique cannot be used independently, application of data mining and the possibility to use a large set of data increase the value and accuracy of this technique.The problem with this technique is inaccurate data submitted by the client.

Conclusion
The most important goal for each bank is to have a satisfied customer, because search for new clients takes time and money.Out of all the methods applicable to this type of research, we have chosen two most reliable.Method of classification provided the best results.There are two main attributes in building future marketing campaigns: the outcome of the previous campaign and the duration of the last contact.The results obtained in this study indicate customer's characteristics to which bank managers should pay more attention.
In this paper we described implementation of Data Mining in direct marketing and concluded that DM is important in manager's decision-making process.
Direct marketing, as a discipline, has evolved into an integrated and systematic field.Second, research trends provide a general framework for researchers as a special area of marketing.Despite significant development over the last two decades, many open questions remain to be answered by future research.The results lead us to believe that direct marketing has achieved a certain degree of scientific status, as evidenced by a large number of empirical researches in this area.
Research based on direct marketing campaign by application of data mining will increase significantly in the future.One of the reasons is that the use of data mining models is important for accomplishing goals of the companies such as to gain competitive advantage on the market.

Table 1 .
Previous research in the field of direct marketingAt the competitive market, data mining faces the growing challenges in order to achieve operational, tactical and strategic advantages.There are several