Data quality considerations in project implementations
- Roni Steuer
- Sep 11, 2024
- 3 min read
Updated: Sep 24, 2024
When organisations define project scopes and objectives, they typically follow a methodology to outline the expected process, identify project goals, and specify the business benefits the project aims to deliver. While the terminology may vary depending on the methodology (such as waterfall, agile, or scrum), the core elements remains. However, we have observed that many organisations fail to properly assess their current ‘as-is’ situation, particularly concerning data quality. This oversight can stem from various reasons—not least, because the way that the data will be used, is part of the project deliverable. Nevertheless, get it wrong and it can negatively impact outcomes, extend timelines, weaken the business case, and in the worst cases, result in the failure to deliver the critical value that the business needs.
So, how can we mitigate this particular risk?
While projects vary, the following key steps can help you assess your readiness and take the necessary actions to ensure a thorough understanding of what it will take to deliver the project—and the risks of failing to do so. This discussion assumes that your organisation has a data strategy, defined ownership, and a clear data structure.
1. What constitutes a ‘data quality issue’?
There are five common indicators of data quality issues, along with a few additional considerations specific to project implementation:
1.1 Inconsistent Format (Design vulnerability) Example: Dates formatted differently across regions. Other issues may include inconsistent length requirements (e.g. cost centres, GL accounts, invoice numbers) or variations in languages/characters.
1.2 Duplicate Data (Design vulnerability) Duplicate data can result in multiple versions of the same information, with the risk that one or more versions will be incorrect.
1.3 Incomplete or Missing Data (User/Control) Essential data may be inconsistently available due to user input errors or failures in the ETL process.
1.4 Inaccurate Data (User/Control) This is harder to detect because it concerns the accuracy of the values entered, even when they meet other requirements.
1.5 Outdated Data (User/Control) Data that was accurate when entered but no longer reflects the current situation (e.g. outdated contact details).
In addition to these common issues, consider these project-specific factors:
1.6 Granularity: Does the data have the right level of detail to support the project’s goals?
1.7 Hierarchy Does the data have the necessary hierarchical structure for efficient usage?
2. Define Your Data Quality Requirements
Before commencing the project, you will likely need some of your data to meet specific quality standards. However, it may not be practical for a single project to drive a complete overhaul of the organisation’s data landscape. Set clear boundaries for what is required and define minimum and maximum data quality thresholds.
3. Assess Your Data Sources
Evaluate the credibility, relevance, and timeliness of your data sources. Identify which data sources are necessary for the project and determine strictly which parts of the data are required.
4. Assess Your Data Quality
At this stage, evaluate your data for the issues mentioned above. Depending on the data type, select appropriate tests to determine its current quality. Work closely with the data owner to ensure completeness and accuracy.
5. Create a Risk Assessment and Mitigation Plan
Develop a risk assessment and mitigation plan for the identified data issues based on your findings. This plan should address where to implement improvements (e.g. at the source, during ETL, in the data warehouse, or the staging area). While the preference would always be to fix data at the source, this may not always be feasible; you may not have access to certain areas, or the associated mitigation may be too costly.
6. Re-evaluate the Project Plan with Stakeholders
With this new information, revisit your stakeholders to present an updated view of the project goals, business benefits, dependencies, risks, and prioritised mitigations.
7. Implement Data Quality Improvements
Data quality improvements can be addressed before the project’s main activities begin or integrated into the work flow. Actions may include data cleansing, enrichment, integration, or rule-based filtering depending on the issues identified. The specific steps will depend on the data type, where the fix is applied, and available resources.
8. Evaluate Data Quality Improvements
Re-evaluate the data quality against the criteria you set and your actions. Have your expectations been met? It is common for data cleansing to require multiple iterations, as each improvement may reveal new issues that were previously hidden.
9. Document and Implement Controls
While completed within the project, the final step should also inform the organisation’s long-term data strategy and management. Propose relevant controls to address the issues identified and any residual issues that were either consciously excluded or deemed not critical enough to endanger the project’s deliverables.
Conclusion
Identifying data quality issues early can significantly improve a project's chance of success by setting realistic expectations regarding risks, timelines, and the potential value of achieving the project goals.
As always, feel free to contact Contact Steuer Consulting today if you’d like to discuss the specific challenges your organisation is facing.
Comments