Data quality problems in business
Poor data quality: problems and solutions
Data quality challenges
Data quality is a real problem that is often poorly addressed
- Quality data is vital for a company's profitability.
- However, in large companies, it is a recurring problem, deep, poorly understood and underestimated.
- For top managers, it goes without saying that teams will produce quality data (If not, just put the pressure on them).
- Data workers are aware and clear-sighted about data quality issues.
- They are already doing their best: improving the quality of data requires changes that they are powerless to trigger at their level.
- Continuous improvement and quality assurance methods are ineffective in data creation processes.
- Indicators and dashboards are necessary but they are only marginally effective on a few non-standard employees.
- Knowing the criteria for data quality is useful but theoretical.
- The use of the term "Data Quality" reveals nascent maturity on the real subject which is : Entreprise Data Management.
Some Albert Einstein Quotes applicable to data quality management
- No problem can be solved from the same level of consciousness that created it.
- Insanity is doing the same thing over and over again and expecting different results.
- A clever person solves a problem. A wise person avoids it.
Quality data is critical to business
Example of "product" data
Formerly, in traditional Sales and Distribution, data errors did not have a direct impact, as they were corrected by company staff during interactions with customers (pre-sales and post-sales). -sale). Now, with e-commerce and computerized tenders, interactions between customers and sellers are rare. Because these contacts are too tedious and too slow for customers (and too expensive for companies). So nowadays, only good data will sell the products. Conversely, when customers are confronted with bad data, they will fail to find the product or give up buying it. With e-commerce and computerized tenders, in the event of bad data, there is no longer a second chance, no negative feedback, just missed, invisible and unquantifiable turnover. In this ultra-competitive environment, companies producing good data will gain a growing and definitive advantage over their less dynamic competitors.
Example of "customer" data
With e-commerce, customers easily and immediately put several merchants in competition. Each purchasing decision is isolated, there is no longer any real loyalty to a brand or a reseller. Reviews left on the Internet can be devastating and out of control. Every mistake made by the company, before or after the sale, pays off. To stay in the game, businesses need to understand what's going on with customers. To quickly and effectively analyze what happened, customer data must be well structured, fully and accurately collected (in accordance with the texts and in the spirit of GDPR). It is under these conditions that companies will be able to develop scoring and profiling, in order to optimize discounts, special offers, management of returns and disputes.
What are the assessment criteria for data quality?
There are countless opinions and versions on the dimensions of data quality. Here is a summary based on experience.
Conformity to reality
- The data is a digital avatar of a reality (example: a customer, a product).
- This avatar must be as faithful as possible to the reality.
- The terminology, syntax and semantics used must ensure an identical understanding of the data by all categories of users.
- Updating of data is also fundamental.
- The relevance of the format of the data and its precision are also critical.
Compliance with standards and file exchange standards
- In engineering as in e-commerce, there are several competing data exchange standards.
- Each company must therefore comply with the choices of its partners and provide data to the expected standards.
- Each standard being different, there are two alternatives: manually maintaining each standard, or semi-automatically transforming original data into each standard through mapping.
- Whichever option is chosen, the different instances must comply with both the standard (rigid by definition) and the object (intangible) they represent, which always creates problems.
Data consistency between systems
- The same data can legitimately be maintained in different computer systems with different purposes.
- These different instances of a data must however be consistent with each other, if not identical.
- Under this condition, duplication is not a problem.
Completeness, completeness of the data
- This is often the first dimension analyzed, because apparently the easiest and the most critical.
- Obviously, all data should be maintained wherever the field exists.
- However, there are always special cases where a data does not make sense and cannot be maintained.
- This is the limit of all quantitative dashboards, the validity of which is often questioned based on particular cases.
- In general, the software does not easily allow finer assignment of attributes.
- Deporting this goldsmith's work in Data quality reports is risky (complexity) and opens the door to endless exceptions.
Technical consistency of data
- Analyzing data requires comparing it to something comparable (other values or rules)
- Comparing successive versions of a data (in the same system and the same field) is often very informative.
- When data is inherited from another system, monitoring this lineage (origin) is highly recommended.
Subjective perception and trust
- As the saying goes: An ounce of good reputation is worth a thousand pounds of gold
- If perfect data gets a bad rap, it won't be used.
- It is therefore very important to measure consumer satisfaction with your data, especially if it is not based on your objective criteria.
What are the typical data quality issues?
Data quality issues common to manufacturers, wholesalers and retailer
- Data volume effect: [number of products] x [Number of characteristics] x [Languages] x [Data standards / Media]
Manufacturers: specific data quality issues
- Workload and complexity to produce data: multiplicity and diversity of data formats to be provided to customers and partners
- Cumbersome to coordinate the creation and maintenance of data throughout the "Products" process: Specifications, Development, Industrialization, Supply chain, Marketing and sales, Technical support.
- Relation to time: very long cycles, limited reactivity due to industrial constraints.
Distributors: typical data quality problems
It is the opposite of the manufacturers!
- Great diversity of data supplied by the different manufacturers, versus a desire to harmonize the presentation of products for all brands.
- Focus on product research (standardization of filters and designations) versus interest of customers (and manufacturers) for discriminating characteristics.
- Huge volumes of data, high sensitivity to costs, high rate of data rarely consumed by customers.
- Limited expertise to assess the quality and relevance of data, inability to remedy it (mass of data).
- High reactivity, urgency (perceived as instability by manufacturers).
Big or complex businesses have more data problems!
- Companies are organized into departments, which operate in vertical silos.
- Each silo is focused on its internal objectives, and manages its own procedures and IT systems.
- However, some processes are transversal to several silos (departments).
- The interfaces between silos that cause problems, each department having specific perspective and constraints.
Of course, a company does not necessarily need perfect data everywhere ...
- As long as internal processes deliver roughly what is expected, the business can live with imperfection.
- On the other hand, for processes that impact external partners or customers, the company must shake up.
- It is not a good idea to wait until the problem becomes acute: when the time comes, the company will not have enough time to adapt.
What are the best practices for quality data?
Taking into account the acceleration of time caused by digitization
Companies are steeped in the culture of the Project, where efforts have a beginning and an end date, with defined objectives and resources. To stay in the digitalization race, adaptation must become constant, activity must be permanent. The "Trial-Error" loops are a great danger for the company, because they are now too slow, too random and insufficient. In order not to waste time and resources, and to accelerate, we must move from Continuous Improvement to Transformation.
The term Education may sound harsh, but it is indeed an in-depth work leading to a change in behavior (not just internal communication).
Establishment of true collective governance
Coordination bodies are fairly easy to set up. But it takes a lot of time and trust for this body to impose the greater good against the specific interests of the departments.
Diagnosis, objectives and aligned means
- Most of the time, the problem is not even clearly stated, but everyone already has their idea of what to do (very often, to buy a software).
- The Projects are launched even before the actors agree on the diagnosis and on the possible solutions.
- The sequence of reactive projects contributes more to the problem than to the solution.
- Often, Managers are encouraged to do so by the company: to run through disposable projects provides a certain visibility and an image of dynamism.
- The Transformation, everyone agrees, but few tackle it.
How long will your business be able to succeed without transforming your data management?