To be able to legally enable compliant transformation of personal and personally identifiable data in a flexible, structured, and simple way, companies need a framework with a sound methodology in conjunction with suitable tool selection.
Beautiful digital world
Whether it’s booking, engaging in a transaction, or perhaps sending an email, the undeniable underlying fact is that data is created with every click. These massive amounts of data henceforth permit companies to make statements about the status quo or create forecasts. As computing power increases, large volumes of data from different sources are increasingly being compiled, processed, and analyzed in a structured and unstructured manner across the globe. Many companies are making use of this data in their analysis, optimization of their business processes, cooperation with suppliers, and performing marketing measures by which desired insights are generated.
The possibilities of data utilization are immense and promising; however, they also pose a variety of challenges for companies of all sizes. This is due to a variety of factors such as the growing threat of cyber-attacks and data leaks, innovations in the General Data Protection Regulation (GDPR), and the continuously changing/evolving of society’s perspective on the issues pertaining to data protection. Companies must therefore increasingly protect individuals’ personal data and are obligated to be transparent in informing consumers about the storage and use of their data. This in turn raises not only the question of how consumers are effectively informed about tracking options and the use of their data, e.g., for marketing purposes, it also raises the question about how individual’s data is analyzed and what their protected rights as natural persons are as it pertains to GDPR.
Stress ratio between redaction and structural integrity
Clarifying the legal boundaries of when and to what extent personal data may be processed and analyzed is very important if personal data is to be used as the basis of a suspicious activity for investigation. Legally, narrow limits are often set by the GDPR in these cases. For example, sensitive data, such as e-mails or accounting system data, cannot be analyzed and included as “evidence” in an investigation without a justified reason. As to what conditions apply, and under which framework these conditions, processing, and evaluation of personal data is legally permissible, depends on many factors. They must be weighed individually in each case and formally assessed from a data protection perspective. If the processing and evaluation of personal data has been declared permissible and data protection clearance has been granted for a forensic investigation, further measures such as the redaction, pseudonymization or anonymization of data must be implemented to ensure that the sensitive data is protected throughout the investigation.
More than black-out: document redaction
Redaction of data is often required when data needs to be made available to third parties, for example, when “evidence” is to be released to foreign authorities or opposing parties in a legal dispute, or when company data is to be transferred to potential buyers as part of M&A transactions. Redaction serves to protect the rights of individuals/companies as it ensures that no personal data or confidential information is disclosed. In the past, people essentially took out their pens to make individual redactions on paper documents. In the current age of email and document repositories, it is a technical and procedural challenge for companies to reliably remove personal data and other confidential information from electronic documents.
Discovery software has a long-established standard in forensic investigations that can be used to locate and redact data and metadata. However, this is often a manually driven and laborious process because it first requires the determination of which data should be redacted. Furthermore, so-called Named Entity Recognition is increasingly being used as it enables automated identification of proper names and other entities (e.g., locations) using complex algorithms.
Regardless of the tools and techniques used, the requirements for document redaction and the resources available must be weighed individually in each case. Often, the first step is automatic redaction, followed by manual review and adjustment of the redactions to ensure the quality of the results in the second step. In view of advancing possibilities, it can be assumed that fully automated redactions will soon be available, entailing new possibilities for the analysis of data, especially e-mail data. The essential question that then remains to be clarified is to what extent AI-based redactions can be recognized in court.
With technology and methodology to success
The toolset in the context of partial automation of document redactions, which were performed using automated redactions (based on rules or lists of terms to be redacted) already exists. Consequently, the need for a methodology that makes it possible to meet the technical requirements for pseudonymization and anonymization of structured data is obvious. Large volumes of structured data need to be loaded into a guided workflow where the contained personal fields can be easily removed or isolated. If necessary, a controlled undo, i.e., a reversal of pseudonymization, needs to be also possible for individual data points or the entire dataset. Also, a maximum of flexibility in implementing data protection requirements is needed.
Our conclusion: by harnessing the power of pseudonymization and anonymization, organizations can realize the full potential of data while keeping compliant with GDPR.