Authors: Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau and Eneko Agirre
TL;DR We present GoLLIE, a Large Language Model trained to follow annotation guidelines. GoLLIE outperforms previous approaches on zero-shot Information Extraction and allows the user to perform inferences with annotation schemas defined on the fly. Different from previous approaches, GoLLIE is able to follow detailed definitions and does not only rely on the knowledge already encoded in the LLM.
Information Extraction (IE) is a challenging task, often requiring to follow strict and highly detailed guidelines in order to perform the task correctly. Traditionally, large amounts of annotations were used to train models to mimic expert annotators. This approach, however, is very expensive and rarely applicable in low-resource domains. Since several years ago, data-augmentation techniques like Distant Supervision [1] were developed to tackle data-scarcity. More recently, with the recent advances in NLP, zero and few-shot methods [2] were created exploiting the internal knowledge of LLMs.
Despite the recent advances, the models still struggle to follow annotation guidelines details. We hypothesize that the main reason why that occurs is because models have a strong preconception of the labels meaning, and, when more details are needed, the models just ignore them. For instance, this is an example of ChatGPT annotating person entities:
Can you annotate person entities if I give you a sentence? The "person" entity is defined as follows: "Each distinct person, or set of people, mentioned in a document refers to an entity of type Person. For example, people may be specified by name ("John Smith"), occupation ("the butcher"), family relation ("mum"), pronoun ("they"), etc., or by some combination of these." |
Certainly, I can help you annotate person entities in a sentence based on the definition you provided. Please provide the sentence you'd like me to annotate for person entities, and I'll do my best to identify and annotate them for you. |
Sentence: "I go from Lasarte to Donosti to work at the University of the Basque Country every day." |
In the sentence you provided, there are no explicit mentions of distinct persons or people by name, occupation, family relation, pronoun, or any combination of these. Therefore, there are no person entities to annotate in this sentence. If you have any other sentences or text where you'd like person entities annotated, please feel free to provide them, and I'll assist you accordingly. |
You can find the original chat here
Although the given sentence is quite easy to annotate, the example above shows that, even when prompted with the instructions to annotate pronouns as person entities, ChatGPT ignores the instruction and forgets to annotate "I" as person. This imposes a problem when detailed instructions are needed to perform the task, something common on the field of IE where the task guidelines have lots of details and exceptions.
To address these issues, we present GoLLIE, a Large Language Model trained to follow annotation guidelines. GoLLIE outperforms previous approaches on zero-shot Information Extraction and allows the user to perform inferences with annotation schemas defined on the fly. Different from previous approaches, GoLLIE is able to follow detailed definitions and does not only rely on the knowledge already encoded in the LLM. GoLLIE is based on Code-Llama. Our code and models are publicly available. In the following sections we will introduce in more detail how the model works, and show some interesting insights. We recommend the reader to read the paper for more details.
Our model allows the user to define custom schemas using Python code! Python classes allows to write human-readable code that is also familiar with current LLMs. Imagine that we want to extract information about space missions, the following Python code will define the guidelines for two new types of entities: Launcher
and Mission
.
Here, the labels are represented as Python classes, and the guidelines or instructions are introduced as docstrings. For some tasks, we would also like to have some additional information about our mentions, like, for example, the space_company
or the crew
of the launcher. We can add that additional information as attributes of the task, with their corresponding guideline as comments.
Once we defined our new labels, it is time to provide the model with a text to annotate. We can do that by simply creating a Python variable with the name text
and assign our desired text to it. We can also add a comment to help the model understand what we want. In addition, we use Black code formatter to standarize the input.
After this, we just need to run the model to generate our annotations!
As you can see, the generated output can be directly evaluated as it is Python working code. This allows the user to directly parse and interpret the output. The model's output also satisfy the type constraints defined in the guidelines, for instance, we defined every attribute as strings, except for the crew, which is a list. Another constraints can also be applied, such as Optional
attributes or more detailed types like Name
, Value
or String
types.
Please, have a look to our Notebooks to get started with the model.
We have evaluated GoLLIE on a set of diverse tasks across different datasets. Our primary goal is the zero-shot evaluation, although we also report the results obtained on the supervised datasets in the paper. The following figure shows a great summary of what our model is capable of:
We compared our model with GPT-3.5[2] and Instruct-UIE[3] (SOTA) on MIT Movie[4], MIT Restaurant[4] and CrossNER[5] Named Entity Recognition (NER)datasets. Our model outperforms previous approaches on almost all the datasets, and performs similar to the SOTA on the rest. In addition to those results showed in the figure, we also evaluated the model on Event Extraction (EE) and Event Argument Extraction (EAE) datasets. Please, check the paper or run the models yourself for more detailed results.
We present the first model that effectively leverages annotation guidelines to enhance zero-shot Information Extraction. Our model, available in 7B, 13B, and 34B variants, establishes a new state-of-the-art, surpassing previous approaches.
In our initial iteration, we focused on demonstrating that instructing LLMs to adhere to guidelines is both possible and advantageous. Our next step is to train models on many more tasks and datasets to expand the model’s capabilities. Ultimately, our goal is to develop a model that can tackle any information extraction task as proficiently as a human without the necessity of manually annotating data.