IBERLEF 2019 PORTUGUESE NAMED ENTITY RECOGNITION AND RELATION EXTRACTION TASKS

Proceedings

IberLEF 2019 Proceedings are published in CEUR Workshop Proceedings Series; http://ceur-ws.org/Vol-2421/

Our overview’s version in the proceedings has been adjusted due to certain inconsistencies found in Table 14. The updated version can be found here.

Overview

The main objective of these tasks is to propose to participants the challenge of applying their systems/solutions to the activities of Named Entity Recognition (NER) and/or Relation Extraction (RE) in Portuguese texts. For this, three independent tasks have been organized, and participants are free to apply for any combination of activities, be it only one, two or all of them.
These tasks will contribute to the progress of Portuguese natural language processing, as there is a demand in the area for the development of new methods, tools and specific resources such as annotated data. These tasks are part of IberLEF 2019.

Task 1: Named Entity Recognition

Description

The first task we propose is NER, the task of identifying proper nouns within a given text and classifying them into one of many relevant categories or within a default category known as Miscellaneous. Our objective with this task is to evaluate the proposed systems in many textual genres. For datasets that have as main textual genres: news, memorandums, e-mails, interviews and magazine articles, we will evaluate the following categories: PER – Person, PLC – Place, ORG – Organization, VAL – Value and TME – Time. On the other hand, for Clinical notes and Legal texts, of which we will only evaluate the PER – Person category.

The coordinators will be responsible for:

  • Evaluating the systems;
  • Reviewing working notes;
  • Camera ready submissions.

The participants will be responsible for:

  • Development and training of systems;
  • Submission of systems;
  • Submissions of working notes.

Activities

The NER task consists of the following steps:

      • Development Phase: For this phase, participants are required to develop a computational approach to NER. This approach, hereby referred to as system, must be capable of solving NER tasks for many textual genres. Participants are free to develop their solution however they see fit, so long as they comply with the requirements described in the training and test phases;
      • Training Phase: The objective of this phase is that participants choose their training datasets. Participants are free to choose any datasets they so desire for training in the various types of textual genres;
      • Test Phase: In this phase the coordinators will evaluate the capacity and reproduction of the submitted systems:
        • Reproduction Stage: For this stage, the participants proposed systems will be executed by the coordinators. Should the coordinators be unable to execute a system, said system will not be evaluated;
        • Evaluation Stage: Inputs composed of corpora in different textual genres will be entered into all systems that passed the Reproduction Stage. The expected output is to be in the “.txt” format, so that it may be evaluated via script following CoNLL-2002 metrics.

Schedule

Below is the activity schedule for this task:

ActivitiesDatesResponsible PartyResources
Development and Training Phases18/03 to 22/04ParticipantsTask 1 - Input and Output Format Examples
Submission of Systems25/04 to 06/05ParticipantsSystem Submission Instructions
System Submission Form
Revised System Submission
Evaluation of Systems01/05 to 10/06CoordinatorsDataset Explanation and System Evaluations
Working Notes Delivery11/06 to 24/06 27/06ParticipantsPaper Publication Instructions
Working Notes Review25/06 28/06 to 01/07Coordinators
Camera Ready Submissions 03/07Coordinators
IberLEF Workshop24/09

Task 2: Relation Extraction for Named Entities

Description

We propose a RE task that involves the automatic extraction of any relation descriptor expressing any type of relation between a pair of Named Entities of the Person, Place and Organization categories in Portuguese language texts.

The coordinators will be responsible for:

  • Providing examples (seeds);
  • Providing test datasets;
  • Evaluating results;
  • Reviewing working notes;
  • Camera ready submissions.

The participants will be responsible for:

  • Testing data;
  • Delivering the results;
  • Delivering working notes.

Activities

This RE task consists of the following steps:

  • Systems Development Phase: In this phase, the coordinators will make a small annotated dataset (seeds) available for the participants’ use in developing their RE systems;
  • Test Phase: The test phase includes two options for participants:
    • Test 1: For this test, participants must extract relation descriptors between NE pairs (of Person, Place or Organization categories) from data provided by the coordinators. This data will already be annotated with NE information when provided, and as such will not necessitate the application of a NER system by participants;
    • Test 2: For this test, the data provided will not be annotated with NE information. As such, the goal of the task will be to extract and classify (with Person, Place or Organization categories) the NEs from the test sentences, and then they must also extract the relation descriptors between pairs of the recognized NEs;
  • Evaluation Phase: In this phase the participants will send their results from the Test Phase. They may submit results from Test 1, Test 2 or both to evaluation by the coordinators. Afterwards, the analyzed results will be sent back to the participants. The metrics used for evaluation phase will be Precision, Recall and F-measure.

Schedule

Below is the activity schedule for this task:

ActivitiesDatesResponsible PartyResources
Release of Examples18/03CoordinatorsTask 2 - Examples
Release of Data01/04CoordinatorsTask 2 - Test Corpora
Test Phase01/04 to 06/05Participants
Results Delivery (Test 1 and/or Test 2)06/05 to 20/05ParticipantsTask 2 - Results Submission
Evaluation20/05 to 10/06CoordinatorsTask 2 Evaluation
Working Notes Delivery10/06 to 24/06 27/06ParticipantsPaper Publication Instructions
Working Notes Review24/06 28/06 to 01/07Coordinators
Camera Ready Submissions03/07Coordinators
IberLEF Workshop24/09

Task 3: General Open Relation Extraction

Description

The task of general open relation extraction aims to identify structured representations of the information contained in unstructured sources, such as textual documents. This task faces many challenges, considering the generality of the problem, as well as the required linguistic knowledge to automatically perform such a task.

This task involves the automatic extraction of any relation descriptor expressing any type of semantic relation between a pair of entities or concepts mentioned in Portuguese sentences. In this task, we consider a relation description as a text chunk that describes the explicit semantic relation, occurring between two entities in a sentence. This task is a generalization of Task 2 by removing the requirement of the entities being named in the text, meaning that any relation between two Noun Phrases (NP) is to be considered.

The coordinators will be responsible for:

  • Providing examples (seeds);
  • Providing test datasets;
  • Evaluating results;
  • Reviewing working notes;
  • Camera ready submissions.

The participants will be responsible for:

  • Testing data;
  • Delivering the results;
  • Delivering working notes.

Activities

This RE task consists of the following steps:

  • Systems Development Phase: In this phase, the coordinators will make a small annotated dataset (seeds) available for the participants’ use in developing their RE systems;
  • Test Phase: The test phase includes two options for participants:
    • Test 1: For this test, participants must extract relation descriptors between NP pairs  from data provided by the coordinators. This data will already be annotated with NP information when provided, and as such will not necessitate the application of a NER system by participants;
    • Test 2: For this test, the data provided will not be annotated with NP information. As such, the goal of the task will be to extract and classify  the NPs from the test sentences, and then they must also extract the relation descriptors between pairs of the recognized NPs;
  • Evaluation Phase: In this phase the participants will send their results from the Test Phase. They may submit results from Test 1, Test 2 or both to evaluation by the coordinators. Afterwards, the analyzed results will be sent back to the participants. The metrics used for evaluation phase will be Precision, Recall and F-measure.

Schedule

Below is the activity schedule for this task:

ActivitiesDatesResponsible PartyResources
Release of Examples18/03CoordinatorsTask 3 - Examples
Release of Test Data01/04CoordinatorsTask3 - Test Corpora
Test Phase01/04 to 06/05Participants
Results Delivery (Test 1 and/or Test 2)06/05 to 20/05ParticipantsTask 3 - Results Submission
Evaluation20/05 to 10/06CoordinatorsTask3 - Rectification Results
Working Notes Delivery10/06 to 24/06 27/06ParticipantsPaper Publication Instructions
Working Notes Review24/06 28/06 to 01/07Coordinators
Camera Ready Submisions03/07Coordinators
IberLEF Workshop24/09

Registration

Registrations for all tasks is now CLOSED.

Organizers

    • Grupo de Processamento de Linguagem Natural da PUCRS – PLN-PUCRS
      Bernardo Consoli
      Fabio Moreira Freitas Da Silva
      Joaquim Santos
      Juliano Terra
      Renata Vieira
      Sandra Collovini
    • Departamento de Informática – Universidade de Évora
      Paulo Quaresma
    • Grupo de Formalismo e Aplicações Semânticas (FORMAS) – UFBA
      Clarissa Castellã Xavier
      Daniela Barreiro Claro
      Marlo Souza
      Rafael Glauber

Contact