"Data Quality (DQ)"
Data Quality is the practice of correcting, standardizing, and verifying data. Data Quality is one of the five building blocks of the foundation of data management. The goal of data management is to provide the infrastructure to transform raw data into accurate and reliable information. In addition to Data Quality, data management consists of data profiling, data integration, data augmentation, and data monitoring.
Most data quality offerings perform one or more of the following functions:
- Data Profiling
- Initially assessing the data to understand its quality challenges
- Data standardization
- A business rules engine that ensures that data conforms to quality rules
- Geocoding
- For name and address data. Corrects data to US and Worldwide postal standards
- Matching or Linking
- A way to compare data so that similar, but slightly different records can be aligned. Matching may use “fuzzy logic” to find duplicates in the data. It often recognizes that ‘Bob’ and ‘Robert’ may be the same individual. It might be able to manage ‘householding’, or finding links between husband and wife at the same address, for example. Finally, it often can build a ‘best of breed’ record, taking the best components from multiple data sources and building a single super-record.
- Monitoring
- Keeping track of data quality over time and reporting variations in the quality of data.
- Batch and Real-Time
- Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean.