The GenAI revolution has already begun - Deutscher AnwaltSpiegel

Download (PDF)

Documents and contracts serve as a reliable source for the rights and obligations an organization has with in relation to its environment. As a result, the entire lifecycle of documents and contracts must be governed and managed reliably and in as standardized a form as possible – from the underlying data sources to creation, delivery, and archiving. As the power of artificial intelligence continues to grow, so do the possibilities for document creation and management. This means that a radical change in document and contract lifecycle management (DCLM) is within reach.

Documents as a contradiction to the digitalized “single source of truth”

The primary purpose of documents in general, and contracts in particular, is to create a binding basis for all parties that can be referred to in the event of doubt and to resolve disputes. Ideally, the relevant facts and the resulting rights and obligations should be stated only once, and then as clearly and unambiguously as possible. When these conditions are met, documents and the subset of contracts can fulfill their true purpose as a “single source of truth” (SSOT) (see “Practical Handbook Legal Operations”).

Documents, as the traditional way of storing and managing data and information, are inherently at odds with information technology. This is because IT systems are ultimately based on computers that are built to perform calculations in the areas of algebra, analysis, geometry, etc., and not to understand text. Texts are merely representations of binary data that make them readable by humans. Of course, computers can store documents, and the data and values they contain (document data in the strict sense), and can also support the creation and editing of texts. However, they cannot per se recognize and understand what the regulatory content of the texts is and what the data means.

This means that there must always be an intermediate step to extract and classify data from texts so that it can be processed by a computer. The content and meaning of texts must be translated into formal logic that a computer can understand – programming languages are nothing else. All these steps are or have been possible in the past only with human intervention and were prone to error as a result – and in general also prone to failure. If you look at this medial or conceptual rupture in considering digitalized documents as the only reliable source of information, you can see the contradiction. Being the sole source already fails because relevant data and information (content) have to be extracted and transformed from documents in order to be processed in systems, i.e., they are always derived and secondary.

Machines can understand and create text for the first time

The use of artificial intelligence is nothing less than a paradigm shift in document creation and management. The ability to vectorize texts, i.e., to convert them into numbers, and to train neural networks based on these vectors – nothing more than the training of foundation models – to then apply algebraic rules to texts, is a scientific advance whose scope we can hardly measure. At the same time, the ability of machines to read and “understand” text has created a gray area regarding the protection and use of intellectual property and documents in general.

The paper “Attention is all you need” (Vaswani et al., 2017; see here) has fundamentally changed research in the field of artificial intelligence worldwide. The article introduces the transformer model, the encoder. This model processes entire sentences simultaneously rather than word by word. This preserves the context, i.e., the relationship between words in the sentence structure.

When text is vectorized, i.e., converted into numbers, the entire sentence or text segment is used, not just the individual word or character. It is then statistically determined which parts of the sentence are relevant for the meaning (self-attention layer). The verb “fly” gives the noun “bat” a different meaning than the words “risky” or “money”. The results are based on simple algebraic rules (logistic regression), something computers are very good at.

The next breakthrough was already achieved in 2018 with the concept of pre-trained bidirectional transformers (Devlin et al., 2018, the abbreviation BERT stands for “Bidirectional Encoder Representations from Transformers”, see here). These models use decoders and encoders (bidirectional transformers) and minimize the number of nodes needed to achieve results. Pre-trained means that the neural networks are specified to correctly close gaps (masking) in the text or correctly continue truncated text.

This approach is ideal for self-supervised learning and allows networks to be trained on much larger amounts of data and to utilize the nodes more efficiently than was previously possible. This change in scale has greatly improved the discriminative power of the regression models.

Bidirectional transformer models have made it possible to represent words, sentences and entire texts conceptually, i.e., abstractly and in terms of content (representing binary data as text is merely a symbolic representation). The capital letter “B” is assigned the ASCII code 66 and thus the binary value 01000010. A conceptual representation means that objects (words, sentences, texts) are understood in the sense of “understood” in their context. It should be emphasized that no linguistic approaches are used, but purely algebraic (computational) rules make this breakthrough possible. With these approaches, computers (calculators) have for the first time achieved the ability to understand a text, which is otherwise reserved for humans.

The importance of foundation models for handling documents and contracts

Properly trained foundation models allow us to communicate with computers using natural language: We can give instructions in our own language (no one seriously takes the claim that prompting is a “own” computer query language anymore), ask questions, and follow up until we are satisfied with the result, or the maximum prompt length has been reached (the prompt length is currently around 4,000 tokens, which corresponds to around 8 to 12 thousand words, or 20 to 40 DINA4 pages of sophisticated text. To solve the above or similar tasks, the prompt length is often not enough. An iterative approach is not possible because the models are all stateless and cannot remember results). The machine becomes our interlocutor.

This feature alone is groundbreaking. In the context of DCLM systems, it will mean that the capture of document data and content, an activity previously reserved for humans, will be transferred to computers. It is easy to instruct the model to extract data and relevant text passages from texts and prepare them for subsequent systems or tasks and analyses.

An instruction along the lines of “make a list of all the contracts we have with company A and its successors and predecessors, show when and how the contracts have changed, and highlight all the places where change of control provisions have been made” will not produce the desired results on the first try. However, with sufficient experience and expertise, a modified GenAI system can be created that is capable of performing this and other much more demanding tasks quickly and reliably.

Another task that machines can and will take over is creating documents and certain tasks from the “negotiation and coordination” phase. Microsoft’s Co-Pilot models provide a first, albeit very weak, indication of what will be possible here.

In the creation phase, the focus is on whether the form and regulatory content meet the requirements of the creator. In negotiation and agreement, it is often a matter of adapting texts so that they reflect what is intended and are sufficiently clear and precise to prevail in the event of a dispute. Another consideration is whether a provision is balanced or biased in favor of one party.

These aspects can be very well represented by semantic proximity and therefore especially by transformer-based models. If single and few-shot learning (fine-tuning) is not sufficient, you can train your own small models based on synthetic data. We use our document generators to generate training data with low variance. Small deviations ensure very good training effects in particular.

AI as a trained expert system for legal texts

The second approach is to train the GenAI to control a document generation engine, which then ensures compliance with rules and uses curated metadata to map requirements such as “clear and concise”, “admissible in court”, or “balanced”. Training AI models on a formally logical command language to drive an engine is standard and produces excellent results. Much of what is set and assigned by hand today will come from GenAI in the future.

The third option is to train the models to extract and input content objects and make heuristic-based suggestions for evaluation and curation. Curated data sets are the basis for training expert systems, which in turn can contribute to increased performance of the GenAI models through reinforced learning. One of the 16 or more models of GPT-5 could be a broadly trained expert system for legal texts in the future. The most important trend in AI at the moment is the combination of large language models with conceptually trained domain-specific expert systems. There is enormous potential here for understanding and creating complex domain-specific texts and documents. It is doubtful whether DCLM systems in the current sense will still be needed. While many are still struggling with digital transformation, the GenAI revolution has already begun.

Author

Dr. Juergen Erbeldinger
ESCRIBA AG, Berlin
Founder and CEO

juergen.erbeldinger@escriba.de
www.escriba.de