Effective data remediation – a driver for risk-optimized information governance

The intelligent use of data is becoming an increasingly important competitive factor for companies. This applies regardless of whether specific process steps are digitized because data is linked via digitized business processes and further evaluated (e.g., via process mining). Key figures are evaluated across entire value chains (e.g., via business intelligence) or completely data-driven business models are established.

As a result, organizations must manage the increasing amount of data and ensure that data quality is both high and compliant with regulations. Since the outlined digital landscape is becoming increasingly complex, it is critical for organizations to have a complete overview and detailed understanding of the creation and management of their data. This includes not only the retention of data, but also its erasure (e.g., due to the GDPR).

Therefore, effective data remediation processes are vital. They ensure the structured identification, classification and separation of data that can be erased. In addition, information governance aims to ensure data security and privacy, improve efficiency and cost-effectiveness in managing information, and meet legal requirements. This is achieved through cross-functional policies, processes, procedures, roles, and controls that are consistently designed. Data remediation processes should be at the heart of information governance.

On average, companies know neither the content nor the value of 52% of their data, according to a Veritas Databerg Research analysis from 2020. Furthermore, the data volume worldwide has been growing exponentially. For 2023 alone, an increase of 24% is expected, according to an IDC and Statista forecast (2021). The use of cloud technologies to store data has done little in terms of disciplining data sparsity – as there is virtually unlimited storage space available. In addition, the use of new digital tools requires additional data to be retained. Otherwise “dark data” – digital information organizations collect and store with undefined value – may keep piling up.

For efficient eDiscovery, information governance is essential

A high volume of digital data has been becoming increasingly important for evaluation in the context of litigations and internal or external investigations. Electronic discovery (eDiscovery) is the technical solution for this procedure. It involves electronically stored information (ESI), such as emails, documents, photos, and other digital evidence. The Electronic Discovery Reference Model (EDRM) provides a framework.

Effective information governance is the key to successful eDiscovery and a way to reduce the costs incurred. The eDiscovery process can only succeed if it is known where potentially relevant data is stored. Data maps are helpful for locating the relevant data. They contain information regarding data stored in IT systems, e.g., volume, form and format, purpose, and storage location. In addition, it must be documented whether and to what extent measures can be implemented that regularly ensure data erasure.

A lack of information governance leads to substantial costs and unnecessary risks

Insufficient information governance causes risks for companies. Storing dark data
• leads to higher costs due to the amount of storage space required
• consumes additional energy which causes CO2 emissions. In 2020, Veritas’ Chief Sustainability Officer explained that unnecessary data storage causes more CO2 emissions than 80 countries do individually.
• causes a data privacy problem if personal data is concerned. Data protection violations can lead to substantial fines, which have been increasing recently.

Another risk arises if information governance processes do not sufficiently help to identify potentially relevant data for new eDiscovery investigations.

Insufficient information governance and data remediation processes lead to over-preservation

If litigation is foreseeable, potentially relevant data must be protected against modification and erasure (eDiscovery phase: preservation), otherwise the accusation of destroying evidence could be made. If content and processes are poorly handled or not documented at all, this may cause something of a rush, leading to decisions that can be detrimental to the company. A key component of the preservation phase is the so-called litigation hold, which is a legal notice to suspend data erasure or alteration within the scope of the pending or anticipated case.

Attention must be paid to identify possible key personnel and verify their information regarding the relevance of data for an investigation. In practice, it is often seen that stakeholders find it difficult to sufficiently limit data, particularly regarding non-personal data e.g., on group drives or IT applications, about the object of an investigation. This sometimes results in several terabytes of data being classified as possibly relevant.
In such cases, the scope of data preservation often substantially exceeds the required level. This is because a precise scoping and/or location would not be possible within the given period. The company accepts the costs of data storage and possible risks associated with data storage in order not to violate the preservation requirements. Possible costs include storage costs and efforts to ensure future availability and interpretability of the data.

Typical challenge: Over-preservation because of inadequate information governance

In typical client situations, a company is unexpectedly confronted with allegations of fraud. The authorities often demand case-relevant data as part of investigations at short notice.
The threat of legal proceedings (both criminal and civil) typically results in an extensive ad hoc obligation to preserve potentially relevant data from modification, alteration, or destruction. However, data retention and preservation processes are often neither sufficiently formalized nor designed for the scope of the impending actions. In addition, existing information governance processes may not allow for adequate localization and restriction of potentially case-relevant data.

Consequently, companies often place substantial parts of their IT solutions on legal hold to ensure compliance with the preservation obligations. This can result in the retention of massive volumes of data, including back-up data, clones of current platforms and databases, disaster recovery back-ups, and comprehensive data from network drives. This often includes data and systems not affected by the scope of the pending litigations.

In addition, it may be difficult to take adequate measures to ensure the availability and interpretability of the data for use in legal cases because the data can be hard to extract from specific IT systems or underlying technology may change over the years.

Ways out of the dilemma: Legally compliant reduction of data and optimization of information governance and preservation processes

In such cases, where data has been preserved in excess of legal obligations, one option is to carefully reduce the data that has been previously put on legal hold. Such a “rightsizing” approach to subsequently limit-preserved data to the case-relevant scope in a legally defensible manner must involve all relevant stakeholders (legal department, IT department, data protection officer and external law firms). It must also be extensively prepared and documented, both technically and in terms of content, to be able to answer the following questions with certainty:

  • Has potentially relevant data been fully identified?
  • Is the separation of relevant/non-relevant data clear, understood and approved by all stakeholders?
  • Has the data been migrated to the final storage location?
  • Are the steps ensuring the completeness and integrity of the data in the migration process documented?
  • Does the storage location meet the requirements for ensuring the integrity and availability of the data?
  • Has the company taken steps to ensure that the data can be interpreted in the future?

The prioritization of workstreams should follow a risk-based approach. The entirety of the data held can be limited to potentially relevant data. Non-relevant data must be erased in accordance with the company’s existing data retention policies. In some cases, this can even amount to petabytes of data.

Beyond that, the next step is to optimize existing processes to ensure that preservation can be limited to potentially relevant data in the future, thus enabling resource-efficient data preservation. Relevant stakeholders from IT and business departments must be involved in the optimization process to ensure that everyone knows their role and responsibilities in future cases. This will make it possible to deal with future cases in a legally secure and efficient manner.

One recommended approach is depicted below.


In consideration of the current developments of growing data volumes and increasing legal complexity, companies should review their information governance concepts – regarding the interconnections with the eDiscovery processes in particular. Process inefficiencies usually become visible in the event of an unforeseen preservation requirement. These are typically associated with high costs and risks for the company as well as critical situations for the stakeholders involved. The scenarios described above can be avoided by:

  1. Creating awareness of the benefits of improved eDiscovery information governance,
  2. Reviewing existing data lakes on a risk-basis and reducing dark data,
  3. Reviewing processes and implementing advanced Discovery information governance, e.g., with litigation hold plans.




