Know Your Client – Leveraging leaked information to protect your business
By Julian Lebherz and Olly Salzmann
Reveal the unknown
When over one hundred media outlets published on the so-called Panama Papers in early April, a jolt went through boardrooms, offices and homes around the globe. Within two weeks the Icelandic prime minister, a founding member of the FIFA ethics commission and the CEO of an Austrian bank resigned. But this first wave of media coverage was only the beginning of what some called the biggest piece of investigative journalism to date, revealing hundreds of thousands of offshore entities.
This – also based on increased political pressure – led to a situation, in which regulatory bodies, law enforcement and tax authorities increased follow-up activities on offshore business matters in order to investigate potential issues connected to tax evasion, money laundering or other related offences. As one consequence, they formally request companies – especially banks and financial service providers – to collect information available on customers and businesses related to the leaked entities. Besides the aforementioned requests placed by public authorities, many companies have decided to shed more light into their direct and indirect offshore activities in order to increase the level of transparency and to meet their regulatory and internal compliance standards.
However, the sheer dimensions of this leak can turn such analyses into a troublesome and expensive exercise. In order to assess ways of leveraging state-of-the-art technology to minimize both manual effort and complexity, it is essential to establish a substantial grasp of its characteristics. The goal of this article is to provide insights into the content, structure and substance of the Panama Papers and to highlight opportunities and limitations with respect to the usage and effective analysis of such information.
ICIJ Offshore Database: Behind the scenes
Although public media uses the term Panama Papers as a synonym for all leaked information that can be found inside the publicly available ICIJ Offshore Database, said repository actually contains both the Offshore Leaks and the Panama Papers. In 2011 an anonymous individual leaked 2.5 million documents about the two offshore service providers ‘Porticullis Trustnet’ and ‘Commonwealth Trust Limited’ to the ICIJ. This data, known as the Offshore Leaks, is current through 2010 and was published by the ICIJ in June 2013. Another anonymous individual, calling himself John Doe, leaked further 11.5 million documents originating from the Panamanian law firm ‘Mossack Fonseca’ to the German newspaper ‘Süddeutsche Zeitung’ in 2015. This leak, which is current through 2015, was jointly investigated by hundreds of journalists in secrecy. And since the content of both leaks is highly complementary, the ICIJ decided to combine them in a comprehensive, publicly available database (https://offshoreleaks.icij.org/pages/about accessed 2016-08-10).
Automated Processing: A structured approach
In order to approach the challenge of utilizing these vast amounts of leaked information in projects and day-to-day business operations a structured methodology is needed. In the following, four stages are individually discussed and detailed. In the initial stage a variety of questions concerning content, reliability, data points and structure need to be answered before relying on such new sources of information. Within the scoping stage clear objectives for the analysis need to be defined in order to align all technical components in a targeted manner. Next, the algorithmic stage covers both definition and execution of algorithms to extract the desired information. Eventually, the evaluation stage entails a manual review of the results to gauge the need for action to be taken.
Initial Stage: Answer 4 questions
(1) Content: Due to the confidential nature, as well as the massive data volume (Offshore Leaks: 260 gigabytes/Panama Papers: 2.6 terabytes), the ICIJ Offshore Database does not contain any raw data. In particular, none of the original emails, database documents, images or texts appear within the database. It is rather a structured collection of information which got extracted from the millions of raw documents. While basic information about companies, individuals and their relationships got included, highly confidential details like banking information or passport numbers were cut out for obvious reasons.
(2) Reliability: In both individual leaks, raw documents were handed over to non-government organizations and subsequently shared with hundreds of investigative journalists. There is limited incentive to falsify information before leaking it to the media and substantial parts of the content can be verified with reasonable effort. However, even though the journalistic community sees these leaks as trustworthy sources of secretive information, no one can and should vouch for their completeness.
(3) Data points: Being set up as a graph database, the ICIJ Offshore Database refers to almost 320,000 offshore entities and information on nearly 370,000 individuals, covering both officers and intermediaries. The relationships between individuals, agents and entities, as well as amongst entities themselves, are captured by close to 1.3 million links categorized in 80 different types of connections.
(4) Structure: Despite its huge scope, the whole database including all tables can be downloaded in the form of text files from the ICIJ website and fed into a variety of relational and graph databases. Based on its simple structure virtually any tech-savvy user can access and start to analyse the data.
Scoping Stage: Leveraging leaked information
Having such an unexpected treasure of secretive information readily available is one thing, actually leveraging it to the benefit of a corporation is a completely different story. From a variety of possible use cases, using leaked information to identify potential involvement of clients with offshore entities stands out in particular. In the face of regulatory requirements (KYC – Know Your Client) that dictate a variety of background checks to counteract money laundering, amongst others, an analysis that matches current or prospective clients with individuals that are already linked to offshore entities can become utmost valuable. It gives companies an edge to effectively cooperate with public authorities if required and, therefore, can reduce the risk of criminal prosecution and reputational damages. Whether such a project is initiated as a one-off evaluation or integrated into a standard customer screening procedure, the approach always requires an algorithmic stage for identifying potential matches and a highly manual evaluation stage for reviewing the results.
Algorithmic Stage: Identifying clients’ exposure
Continuing with the ICIJ Offshore Database as an example, the algorithmic stage can be seen as an automated comparison of all current clients, being it natural persons or legal entities, with all individuals and entities mentioned in the database. With around 700,000 leaked names, the computational effort of comparing every client name with every possible leaked counterpart can quickly become a bottleneck.
In addition, experience shows that a direct comparison can only be used as a starting point and does not yield the aspired level of precision. In order to improve, punctuation should be removed or replaced on both ends, being especially important for legal entity names. For example, ‘Last-Trust (HK) LTD.’ becomes ‘LastTrust HK LTD’. The resulting names without punctuation are once again compared to their counterparts.
As a next step, experts need to define a list of terms to be cleaned from all names. For instance, ‘LastTrust Limited’ and ‘LastTrust LTD’ would match after such a modification. However, names with swapped family and given name would not be identified yet. In order to cover such instances as well, all names are split into single tokens (words). Each name is represented by multiple tokens and all of them are individually compared to each set of corresponding tokens from leaked names. Through tokenizing ‘Max Mueller’ and ‘Mueller Max’ represent a perfect match.
Utilizing robust comparison (fuzzy) techniques, which allow for typos and different forms of spelling, can certainly increase the number of valid matches but will also generate more invalid matches (false positives) in the process. In the end it all boils down to the level of manual review effort that is considered reasonable for a specific case. If missing one valid match could become very expensive, manually evaluating loads of false positives might even be justified.
Evaluation Stage: Assessing the need for action
Despite tremendous advancements in the field of machine learning and artificial intelligence, a manual evaluation of preliminary matches cannot be fully automated yet. In this stage, the collection of potential matches is gradually narrowed down by tapping into pools of additional internal information and publicly available data on the leaked entity or individual. Cases in which the added intelligence does not correlate are excluded. For all remaining matches the impact on existing or prospective client relationships needs to be assessed by experts. However, when looking at examples of fines for assistance to tax evasion the cost of such internal screenings with a certain share of manual review effort is put into perspective.
In summary, utilizing leaks of secretive information – like the Panama Papers – for customer screening or one-off KYC evaluations can significantly reduce the risk of criminal prosecution and reputational damage. In addition, proactively analysing the exposure of a firm’s client base also eliminates the information asymmetry as compared to the media. Although the final assessment requires manual effort, major parts of complete screening endeavours can be automated using sophisticated pre-processing and (fuzzy) algorithmic comparison techniques.