Background: DAMA and DCAM are two industry standards that describes data management. There are lots of terms to describe data management including Information Management, however data is different from information. Data Management is also not database management which is a common mis-conception. In brief data management involves two main parties: data producers (those creating data, for example inputting customer details onto a system) and data consumers (for example those running reports for management decision making).
Data Management is a function within an organisation that ensures the data consumers receive the correct data from the correct system. This is done by creating and maintaining a catalog of where the key data is stored so that their is a common understanding on where to go to retrieve a golden source of the data (and not a copy that may have been modified). This catalog is an example of Meta-data (as you start to read more about the data management topic you'll see references to Meta-Data). The Data Management Functions also ensures there are Data Stewards appointed to ensure data is recorded in a consistent way so it can be retrieved, for example a customer name can be recorded as Mr Firstname, Lastname or Firstname Middlename Lastname or Initial Lastname. In order to manage large volume of data effectively and ensure there is a single view of a customer the data most be produced/recorded in a consistent way. Data profiling is an excercise that an organisation can take to determine if the data is recorded consistently in the golden source repository.
Lastly data quality is a key activity of Data Management, this includes capturing data errors and anomalies and fixing them at the source, plus identifying any re-occurring errors. It also includes defining a threshold to measure the number of data quality, for example the number of empty fields. When measuring data quality the following should be considered:
Precision - Is it acceptable to round up or down or are several decimal places required.
Accuracy: Does the data true and representation of real-life events.
Completeness: Is all the data required recorded and available.
Currency: Is the data up-to-date, it may be accurate at the time it was recorded but has it been refreshed.
Timeliness: Can the data be retrieved in the time-frame required by the data consumers, for example if it's a daily report can the data be complied and formatted within a day.
Uniqueness: This refers to the number of duplicates that exist within the data, if there are a lot then a Data deduplication (de-duping) exercise may be required.
Referential Integrity: This refers to ensuring each entry has a unique identifier, for example a primary key or foreign key.
Consistency: The records may be accurate, complete and unique however do they relate to each other.
Validity: this ensures the data is valid for the reporting or use that it's required, for example the data might be accurate, complete and consistent however it may not be relevant for the intended use (i.e. an example of data this is not valid is electricity bill data is included in a grocery list, both may be accurate but it's not relevant/valid).
Data Management Audit Tests:
Data Taxonomy - Check if there is a data dictionary or data taxonomy that describes what is the key data for the organisation. For example 'Customer' should be an entry in the dictionary and should describe what data attributes form customer such as name and address etc. There should be a clear definition of how the name is stored (first, lastname or first name middle name and lastname). Data Quality - Check if there is a threshold of tolerance of data quality errors and whether it's measured. There should be clear reporting on data incidents that occur and a process to treat them including fixing the data at source. Data Quality KPI are essential to track improvements or degradation in data quality. Data Governance - Confirm there is an authority to approve the data taxonomy and make decisions on the treatment of data quality issues. Data Sourcing - Determine if all reports used by business data consumers are being sourcing data from the golden source (authorities source) and whether there is an agreed golden source in place for certain types of data entity, for example customer data. Data Lineage - Is there a clear understanding of how data flows through the systems and where the manual data transfers are. Determine if a piece of data in a report can be traced back from data consumers to data producers.