Without improvements in the quality and completeness of data captured at source, changing the processes and systems will have little impact. Good data is the lifeblood of any business and requires effective management like all other assets – Excerpts from the PWC report: Put data first.
There is a lot of focus in businesses today to adopt a data – driven culture. The management expects accurate and reliable information faster and more efficiently to enable data-driven informed decision-making. And with good reason – a focus on data can transform the business. Following the path of data to information, information to insight and then insight to action can help increase revenues and decrease costs (and risks). Recently, I have been struggling with the first step – source of data. No matter how advanced the business intelligence tool used is or how well the analysis is done and presented, unless the quality of the source of data is good, it is a case of Rubbish in – Rubbish out. Unless data used and presented is seen and believed as trustworthy, the whole purpose of the exercise is defeated. Instead of spending time analyzing, gathering insights or identifying the actions, a lot of time is spent in arguing over the accuracy of data, explaining gaps on perceived discrepancies or doing complicated workarounds to ensure reporting is not impacted while data quality issues are sorted out.
Based on my experience of what works and what does not and insights gained from the lot of reading I have been doing on this topic, here are five strategies for getting a handle on source data quality and making data quality improvement an ongoing, productive exercise:
Strategy #1: Create a cross-functional data governance team – This is an important first step to set the right structure, authority and accountability for the data improvement initiative. The intent is not to get into the “death by meetings” scenarios but to break the data “silos” and bring the right people together (across the end-user, creator, administrator and analyst groups) to make informed decisions about the who, what and how questions. The team will define the processes, business rules, roles and responsibilities involved in the creation, management and consumption of data across the organization. The team will also serve as a forum for assigning priorities and escalation point for data issues. This ensures that data is being cared for across the organization and balanced decisions are taken.
Strategy #2: Identify the broad level root causes – There are many reasons why the source data could be wrong – it could be related to extracting data from different source systems with conflicting information, errors at the time of manual entry of data, unclear understanding of what data needs to go where or which business rule/logic is the right business logic to be applied to a particular set of data. The more you dig, the more possible sources of error you could find and in enterprise scenarios, the steps you take to fix root causes today may rise tomorrow as a multi-headed monster with a whole new set of root causes. Hence the suggestion to identify the broad level root causes. The way to do this is to not attack the whole set of data in one instance – apply the 80/20 rule and select a few segments of data to dig into. This will increase your chances of isolating the causes better and resolving the major issues faster.
Strategy #3: Sustainability – Systems and People – Cleaning up the source data cannot be a one-time exercise. We are constantly adding new data or changing existing data. So it is important to keep in mind whether the solution to the problem is sustainable in the long run. A temporary flurry of activity and a few tweaks in the systems will do just that – fix the issue temporarily. We know that the more you reduce manual intervention of data, the better your chances are to reduce errors. This is where automation comes into place – the aim should be to have automation solve identified data discrepancy areas. The higher the percentage of automation, the more sustainable and efficient the initiative would be in the long run. And in the short term, training and constant communication to build awareness among the creators of data will help reduce errors. Tools and best practices training sessions should be an integral part of the data improvement strategies.
Strategy #4: Forget Perfection – It is not going to happen – Remember that perfect data by itself is not the end objective. It is the insight that data is used to generate that is the main goal. Don’t drop that ball by not moving forward on analysis till all imperfections in data are sorted out – ship out the data once you have reasonable confidence that it is accurate within a certain range. We have to balance efforts and time (to improve the accuracy of data) with the outcome needed. If your data is off say 5-10%, it is “good” enough to start using for analysis and the next set of actions. Data quality improvements have to be considered as a work-in-progress iterative process. As Jim Harris says here – “A smaller data quality emphasis SOMETIMES enables bigger data-driven insights, which means that SOMETIMES using a bigger amount of lower-quality data is better than using a smaller amount of higher-quality data.”
Strategy #5: Measurements and Metrics – Last but not the least, my favourite topic – metrics. Data experts have identified certain standard dimensions that impact data quality – Relevance, Accuracy, Timeliness and Punctuality, Accessibility and Clarity, Comparability and Coherence (some definitions of the dimensions here). Why measure? How else can we show the progress of our efforts and how do we build the business case that justifies investment into the data quality improvement initiative? Through a few simple, “right” metrics. And what should be the main factor while choosing the metrics? The usefulness and relevance to the end-user – if we can’t link the metric directly to the impact on business performance, then it is not a metric that is useful or relevant. Anish Raivadera, Data Quality expert has written an extremely useful eight part series on such metrics based on the above data dimensions here.
Data is everybody’s business. Whether we create, share or consume data, we all should be concerned about quality of the data in the organization. Unless this awareness about the importance of the quality of data and the role that each function (and not just IT) plays in ensuring the right quality of data is ingrained into the organization as part of the culture, we cannot tap the full power and potential of the available data.
Coincidentally, today I chanced upon the shareholder letters written by Jeff Bezos, Amazon and I cannot conclude this post without excerpts from there that I felt was particularly relevant to this post. His 2005 letter was based on business decisions and their dependency (or not) on data:
“Many of the important decisions we make at Amazon.com can be made with data. There is a right answer or a wrong answer, a better answer or a worse answer, and math tells us which is which. These are our favorite kinds of decisions….As you would expect, however, not all of our important decisions can be made in this enviable, math-based way. Sometimes we have little or no historical data to guide us and proactive experimentation is impossible, impractical, or tantamount to a decision to proceed. Though data, analysis, and math play a role, the prime ingredient in these decisions is judgment….. Math-based decisions command wide agreement, whereas judgment-based decisions are rightly debated and often controversial, at least until put into practice and demonstrated. Any institution unwilling to endure controversy must limit itself to decisions of the first type. In our view, doing so would not only limit controversy —it would also significantly limit innovation and long-term value creation.”
So, what do you think? What other strategies would you recommend for improving quality of data? Who is responsible for source data in your organization? I would love to hear back and learn from you.
Picture courtesy : http://www.flickr.com/photos/ocdqblog/5065103584/