Data quality in customer relationship management (CRM) – literature review

The aim of this paper is to examine challenges that organizations face when they start to deal with quality of customer data more seriously in order to manage their customer relationships better. Research extracted from the literature review has identified some problems with the quality of customer data as well as suggestions for their solutions. The author found that challenges regarding the quality of data used in customer relationship management are reflected in: decentralized data storage, inconsistencies in input and storage, inadequate integration of different data sources, different data defects, and their tendency in quality deterioration over time. In addition, problems have been identified in the high costs of maintaining data quality, as well as new challenges in the form of big data and open data. Possible improvement solutions have been suggested through a number of tools and frameworks by different authors


Introduction
A key source of the company's competitive advantage lies in its ability to dynamically respond to changes (Adamik et al., 2018). As a result of changes in the digital environment, companies are increasingly facing large amount of data stored in diverse and often inconsistent databases. The main necessity for the quality of data used to manage customer relationships is the need for accurate information that will serve to manage campaigns and determine customer value (Zahay, Peltier & Krishen, 2011). A 2015 Experian report identified that nearly 80% of organizations lack sophisticated access to data quality. Gartner's report predicted that "by 2017, 33% of Fortune 100 organizations will experience an information crisis, due to their inability to evaluate, manage and trust their information". This is followed by evidence suggesting that many organizations are unaware of their data quality problems and either ignore them or don't prioritize them (O 'Brien, 2015).
The subject of this paper is the set of challenges that organizations face when they become more serious about data quality (DQ) in the context of customer relationship management (CRM). The research question addressed in the paper is: What are the problems organizations face when dealing with the quality of their customer data and how can they be solved? The author has tried to present an overview of the most common problems in the field of data quality that organizations face, with an outline of possible solutions.
After explaining the concepts of CRM and DQ and their interrelation in Chapters 1 and 2, the author will refer to data quality issues in CRM and their potential solutions in response to the Research Question in Chapter 3. Subsequently, concluding considerations will be presented.

Customer relationship management (CRM)
CRM is a strategic approach to systematically target, monitor, communicate and transform relevant customer data into information that underlies strategic decision making and action (Missi, Alshawi & Fitzgerald, 2005). The goal of CRM is to improve operational efficiency, STRATEGIC MANAGEMENT, Vol. 25 (2020), No. 2, pp. 040-047 achieve an acceptable level of customer connectivity (Reid & Catterall, 2014), manage customer relationships within the company and increase their satisfaction (Tu & Yang, 2013), and as a consequence of all this -increase revenue (Negahban, Kim & Kim, 2016). CRM enables organizations to identify and attract new customers, as well as increase retention of profitable customers through developing, strengthening and managing relationships with them, considering sustainable company growth (Sharma, Goyal & Mittal, 2010). In today's competitive business environment, key problems relate to the quality of organizational data and their integration and it is necessary to capture customer information in real time (Missi et al., 2005). Unfortunately, the reality is that CRM classification models are outdated, unbalanced, and noisy (Natchiar & Baulkani, 2014), and customer-stored data is often located in separate departments and not linked throughout the company's CRM (Missi et al. 2005). The problem with data quality only occurs when an organization wants to correct anomalies in one data source or when it wants to integrate data coming from different sources into one new data source. Due to the tendency of organizations to avoid or ignore the importance of data quality and their integration process, we often witness the failure of CRM projects (Missi et al., 2005). Companies misunderstand that it is necessary to have large amounts of customer data, and in fact it is much more important to have quality data (Zahay et al., 2011).
In order for CRM to be successful, it is necessary to integrate three key components: business processes, the human factor and technology (Negahban et al., 2016). Business processes need to be streamlined (because sometimes complex processes cause data complexity) (Foss, Henderson, Murray & Stone, 2002), employees need to be motivated by senior management and organizational culture to pay more attention to the quality of the data collected (Reid & Catterall, 2014), (Peltier, Zahay & Lehmann, 2013) and all this should be supported by the use of technology that will optimize customer interaction. As a large percentage of customer interactions will occur rather on the Internet than with employees, technology must adapt to a changing and unpredictable market (Chen & Popovich, 2003), but even the most sophisticated IT or business systems will not succeed if they rely on data of insufficient quality and if they are not structured for the purpose for which they are applied (Sharma et al., 2010). In some organizations, CRM is a simple technology solution that enhances customer targeting efforts through the use of a separate database and sales automation tools, while other organizations see it as a tool specifically designed for 1-on-1 customer communication, which is the responsibility of sales, call centers and marketing departments (Chen & Popovich, 2003). Missi et al. (2005) cite the basic types of data that organizations collect about customers: demographics (gender, age, marital status, education level, home ownership, etc.) that are very stable and not very expensive, but the problem is that we can hardly get them on an individual basis with a high level of accuracy; behavioral data (types of purchases, payments, customer service activities, etc.) that are the easiest to predict, but are the most difficult and expensive to obtain from external sources; psychographic data (opinions, lifestyle, personal values, etc.) that can lead to improvement and be used to determine a customer's life stage, but the weakness is that they indicate behavior that may be highly, partially, or marginally related to the right behavior (Missi et al., 2005). In addition to these types of data, Zahay et al. (2011) also emphasizes the contact information of the users, which forms the basis for marketing efforts, as well as personalization i.e. the ability to tailor marketing communications to the individual customer. Personalized communication strategies can be developed by using demographic information with psychographic profiles to achieve interactive communication with users (Zahay et al., 2011), in order to create, elaborate and reinforce meaning in customers' relationship with the company (Ferreira et al., 2019).
It can be concluded that the quality of customer data is very important and that information about customers should be carefully collected, as it is one of the main, if not the main factor that will affect the performance of CRM systems. Having the right information at the right time is essential to a successful CRM strategy (Sharma et al., 2010). Peltier et al. (2013) provided a definition of quality data: "Customer data are of high quality when the information collected across multiple transactions, touchpoints, and channels accurately reflects the behavior and sentiments of customers, STRATEGIC MANAGEMENT, Vol. 25 (2020), No. 2, pp. 040-047 both collectively and individually" (Peltier et al., 2013). The high quality of well integrated customer data is the foundation of successful CRM projects. If the data quality problem is not resolved on time, low data quality can affect operating costs, customer satisfaction, effective decision making, and CRM workers' confidence. The trouble is that even when problems are noticed at an early stage, they are still difficult to address. That is why it is important to create a comprehensive data quality management strategy at the beginning of CRM implementation (Reid & Catterall, 2014) and to understand data quality management (DQM) as a continuous process (Even, Shankaranarayanan & Berger, 2010). Customer information is usually heterogeneous data collected from different sources, mostly informal, unlimited and in different formats (numeric and categorical) (Tu & Yang, 2013). They can contain a large amount of redundant and irrelevant information that affects the performance of CRM (Natchiar & Baulkani, 2014). The most common sources of "dirty" data are: legacy systems that contain poorly documented and outdated data, the distribution of data across databases in different departments with a lack of data coding standards, typing errors, poor data entry, missing data, etc. (Reid & Catterall, 2014).

Data quality (DQ)
Data quality is both a technical and organizational problem, it also requires understanding the types of information required and understanding how this information is used to make sound marketing decisions (Peltier et al., 2013). Most authors have tried to improve the quality of data using mathematical and programming solutions (Sharma et al., 2010). Technological developments have allowed new data mining (DM) approaches to analyze customer data to be applied to find the best CRM strategies (Natchiar & Baulkani, 2014). DM represents a large group of algorithms and methods that are used to analyse large data volumes (Dusmanescu et al., 2016) in order to extract comprehensible, hidden and useful information from data, to find unexpected connections between them (even predictive information that experts may miss because they are beyond their expectations) and to predict trends and behavior based on them. The process consists of observing specific examples in order to define general conceptual definitions (Vukelić, Stanojević & Anđelić, 2015). For DM tools to assist in CRM, appropriate data quality is crucial (Sharma et al., 2010), but it is not possible to establish generally acceptable procedures for DM classification, as it is difficult to find a single methodology that will address all DM problems that CRM data yields: heterogeneity, dimensionality, serious anomalies on data, unbalanced classification, data encryption, etc. (Tu & Yang, 2013). DM tools can provide answers to business questions that are time consuming and complicated to solve; it is only necessary to research which tools would be appropriate in which situation and apply them accordingly to improve data quality (Sharma et al., 2010).
The failure of the CRM system has been attributed to the inability to facilitate and improve the organization-level transfer of customer information (Peltier et al., 2013). The company needs to know the state of its data to know what needs to be improved and what benefits it will bring, and because of the difficulty of managing large amounts of data, companies will sometimes leave these problems to firms specializing in this (Foss et al., 2002). Higher quality targeting typically increases the value of a data set, but can involve a lot of costs. The elements that the data describe can change over time, such as changing a customer's address, their profession, marital status, etc., which means that their quality may deteriorate over time. Maintaining data at a high level of quality involves significant costs associated with efforts to detect and correct defects, set up management, redesign processes and invest in quality monitoring tools (Even et al., 2010). Eppler & Helfert (2004) split the costs into those caused by low data quality (verification, reentry, compensation, low reputation, wrong decisions, sunk costs) and those to improve data quality accuracy (training, monitoring, development and usage standards, analysis, reporting, plan repair and implementation) (O 'Brien, 2015).

Data quality managementproblems and solutions
By analyzing the extracted literature, an answer to the Research Question was formed, which presented the problems of different quality dimensions in CRM and potential solutions to some of the mentioned problems. expected data quality in CRM systems, which depends on the specific data element. Some data must be perfect, such as unique keys, internal security information, and anything audited. However, some other data may be estimates or even missing, making it easier to maintain bases and reduce their costs. Even et al. (2010a) have proposed a model that allows different levels of quality to be set for different records, so that optimal quality varies depending on the records.  Reid & Catterall (2014) proposed simplification of the database architecture in order to make data quality support easier and more cost effective.  Foss et al. (2002) state that it is sometimes better to focus on process simplification than on case integration.  Practitioners of CRM classification require a standardized framework with simplified DM processes that could produce satisfactory results for CRM data in general, with all the DM challenges mentioned previously (Tu & Yang, 2013).  Ahmed et al. (2016) have shown that using Java programming and SQL can define appropriate constraints and get 100% accurate data. Data from different users can be standardized for more accurate information and its processing. Using SQL servers will help address key data quality tasks, such as profiling, cleaning and refining, as well as auditing. This will create approaches that will reduce system integration costs, develop benefits, and mitigate data risks.  The ability to update data throughout the system would be preferable in order to avoid problems when the country code is changed or when data is integrated from different sources. Consistency issues should be addressed at an early stage of integration by defining data standards and data rules within the organization (Jaya et al., 2007).  Given the common legacy of poor data quality from a previous system being ported to a new system, the solution may be to invest in a data cleaner that will reform the data before being transferred to a new database so that it stores pure data only (Reid & Catterall, 2014).  Using the right tools has a direct impact on the performance of the adopted CRM (Alshawi et al., 2011). As different types of errors can exist in the same data set, we often need to implement more than one error detection tool (Abedjan et al., 2016). Missi et al., (2005) cite a variety of tools that can be used to achieve data quality and integration: tools that provide insight into one relational access to data, tools that transform non-relational into relational data; tools that develop, test, and perform transformations in databases and automatically generate code that makes it easy to manage even the most complex transformations of all types of data and applications; tools for converting data among hundreds of formats and applications; tools for consolidation, verification, standardization, real-time data profiling; a tool that records, models, and maintains metadata from various sources, stores numerous models and versions.  Data quality can be improved by e.g.
automating data collection, continuous inspection, correction and cleaning of data (Even et al., 2010a).  Data completeness is best addressed by improving the process of data entry by including data verification processes. In the case of mobile CRM, this can be automated or, in the case of traditional CRM, should involve an expert who will verify data entry, which will help improve accuracy in the organization (Jaya et al., 2007). Even et al. (2010a) propose that older data should be ignored, and that companies should invest in the quality of the newer data. It is only necessary to determine what percentage of the data should be captured.  To ensure that eCRM uses the latest user data, organizations can apply rules that require the use of different date formats from the same source, so the one with the most recent date will be selected and saved (Ahmed et al., 2016). The mobile CRM system is explicitly designed to work with data from a central CRM system, which provides sufficient data accuracy. characters in mobile number / user name). A security identification scheme can be used during the registration process to verify the validity of the user's identity and his / her phone number. Application usage can also be monitored to find out which users are actively using the application and what messages they are interested in (Hable & Aglassinger, 2013).

Problems:
 Data is often stored in separate departments and it's not linked across the entire company's CRM (Missi et al., 2005).  Lack of agreement on a standard set of dimensions that contribute to high data quality (Jaya, Sidi, Ishak & Affendey, 2007).  Problems of logical consistency of data entry (in many organizations there is no common language of logically compatible data that would affect CRM (Alshawi, Missi & Irani, 2011), as well as inconsistencies in how information is stored in different units, which occurs because in CRM, almost everyone in the organization is in touch with the application. This results in a greater likelihood that data quality will be poor given the large number of people who interact with the data (Reid & Catterall, 2014).  The status of existing customer databases created in the previous period when not much thought was given to the quality of the data being collected (Alshawi et al., 2011).  Inadequate integration of different data sources, so each product has unique identifiers in the database and storage. In order to see the relevant data of the user's order, it is necessary to access the files in the database (Ahmed, Amroush & Ben Maati, 2016 (Even et al., 2010), (Even, Shankaranarayanan & Berger, 2010a), duplicate records (Reid & Catterall, 2014), noise records, and unbalanced datasets (Natchiar & Baulkani, 2014), insignificant values (e.g. an attribute that preserves the value of a bank employee in charge of a customer) (Hable & Aglassinger, 2013). Defective data, in addition to misleading organizations about their customers, can compromise the performance of DM's data quality tools if they are not filtered out, because customer information is heterogeneous and with different scales, with many irrelevant features. (Tu & Yang, 2013).  A unique value violation, whereby the same user can be stored in the database multiple times with a different user number (Hable & Aglassinger, 2013). It brings poor pairing of individual customer records and thus the inability of the company to determine how many customers it actually has because it has stored the same customer several times in the database (Reid & Catterall, 2014). Alshawi et al. (2011) and Reid & Catterall (2014) mention problems with the unique customer identifier, where one of such problems is cited by the lack of a postal code in European countries.  Syntax violation. For example, it should be ensured that phone numbers are of a certain format, that they have a limited number of non-numeric characters, so that records with incorrect values are not stored (Hable & Aglassinger, 2013).  Outdated values in user profiles (Even et al., 2010a). Values that were correct may not be true anymore. For example, the user may have changed the phone number (Hable & Aglassinger, 2013).  Data quality deteriorates over time. The elements that the information describes can change over time, such as changing a customer's address, their profession, marital status, etc. (Even et al., 2010). If Amazon has 60 million active users per year, it begs the question whether it is economically logical to maintain all records at a high level or whether it should be limited to a specific subset (Even et al., 2010a).  High costs for maintaining a new database (Reid & Catterall, 2014). Maintaining data at a high level of quality involves significant costs associated with efforts to detect and correct defects, set up management, redesign processes and invest in quality monitoring tools (Even et (Jaya et al., 2007). Armeanu, Andrei, Lache & Panait, (2017) stated that although "it is not possible to determine a priori whether this huge amount of data should be entirely used in the decision making process", one cannot ignore complex relationships that are based on the correlations between variables and output (Armeanu, Andrei, Lache & Panait, 2017).  For a better view, the identified problems and suggestions for their solution are presented in Table 1.  (2007) state that it is necessary to have a framework for standardizing data, and also that it is necessary to include the process of automatic or expert verification of data. 4 The unsatisfactory condition of existing customer databases The solution is to simplify the database architecture and invest in a data cleanup tool that will reform the data before it is transferred to a new database (Reid & Catterall, 2014).

Hable & Aglassinger (2013), Ahmed et al. (2016) and Missi et al. (2005)
state that it is necessary to use tools that will allow adequate integration of data of different formats.  CRM systems make it easy to build long-term customer relationships by creating centralized databases and enabling sales force automation. This minimizes duplication of data, retains customer knowledge, institutionalizes links between users, helps manage numerous products / services, and increases revenue while allowing firms to cross-sell. Mobile CRM enables employees and managers to access real-time data and make better decisions (Negahban et al., 2016). In addition to the aforementioned proposals for solving certain data quality problems, it is essential that employees are supported by senior management and motivated to manage the data well. Human resource management, methods and processes, software and guiding principles should be combined to ensure efficient data management and the ability to transmit them (Foss et al., 2002).

Conclusion
CRM has become one of the focal points for many industries such as banking, retail, telecommunications, insurance, etc. (Natchiar & Baulkani, 2014). As CRM relies on data in its operation, it is very important that the quality of the data is appropriate so that the organization can have coordinated CRM responses to today's business needs (Alshawi et al., 2011). Increasing numbers of structured data, better tools and big economic drivers are pushing organizations to aggregate and use data from different sources (Bidlack & Wellman, 2010). New challenges are emerging in managing data quality resulting from new technologies -big data and open data, with their ability to collect large amounts of data from different sources and store them as different types of data -structured, unstructured and semistructured (Jaya et al., 2007).
Data defects can prevent managers and analysts from having a real picture of customers and their purchasing preferences, which can significantly affect marketing efforts that will not produce the expected results and lead to poor decisions (Even et al., 2010), and potential financial risks i.e. financial losses that affect company's prosperity (Valaskova, Kliestik & Kovacova, 2018). They can also affect a company's inability to determine how many customers it actually has (Reid & Catterall, 2014). When looking at the quality of data used in CRM, we can conclude that bigger is not always better, because increasing the number of records that are monitored, maintaining more attributes and achieving perfect quality may have technical and functional merit, but it is not profitable (Even et al., 2010).
The analysis of the extracted works pointed to certain problems regarding the quality of customer data, as well as suggestions for their solutions. We can conclude that the challenges regarding the quality of data used in CRM are reflected in: decentralized storage of data, inconsistency of their input and storage, inadequate integration of different data sources, different number of data defects, and their tendency to have quality that gets worse over time. In addition, problems were identified in the high costs of maintaining data quality, as well as new challenges in the form of big data and open data. Possible solutions have been suggested through a variety of tools and frameworks to improve them.
In order for organizations to take full advantage of the customer data they possess for the purpose of adequately analyzing and evaluating their desires, preferences, behaviors and thereby gaining a competitive advantage on the basis of valuable, hard-to-imitate data, it is imperative to align economic benefits with the optimum level of data quality (Even et al., 2010).SM