Task & Evaluation

Task description

The task is monolingual automatic term extraction (ATE), without distinctions between term types. Participants will be provided with all corpora and annotations, except for the ones on heart failure, which are the test corpora. Participants are free to use the other corpora in whatever way they see fit to improve their system (obtain POS-patterns, use them as training and/or development data, determine optimal threshold values, etc.). Both training and test corpora are annotated following the same protocol. Participants also know the number of annotated terms in the test corpora (mentioned both here and in the dedicated paper).

Tracks

There will be 3 separate tracks per language and participants can enter in one or multiple tracks. NOTE: the original distinction between open and closed tracks has been dropped!

Submission & Evalution

> Submission format: TXT-file, one candidate term per line
> Scoring: precision recall, and f1 score will be calculated twice: once including and once excluding Named Entities
> Ranking: participating teams will be ranked based on f1-scores (leading to two rankings, one including and one excluding Named Entities)

During the testing phase, all participants will get access to the corpus on heart failure, without the annotations, and will have to send a list of all automatically extracted term candidates for evaluation.

Moreover, we encourage participants to report their scores on the other (training) corpora as well. Finally, all participants will be requested to fill out a form to answer some basic questions about their methodology, to further promote comparisons and discussion. These questions will concern the use of machine learning, the use of external resources, POS-patterns, rule-based filtering, etc. 

Casing, POS-tagging, & Lemmatisation

True-casing, POS-tagging & lemmatisation are non-trivial tasks but not the focus of this edition of TermEval. Therefore, all data will be lower-cased, non-lemmatised, and with only one entry per term. For example, the English corpus on dressage contains the term “bent” (verb – past tense of “to bend”), but also “Bent” (proper noun – person name). While both capitalisation and POS differ, and “bent” is not the lemmatised form, there will be only one entry: “bent” (lowercased) in the gold standard (other full forms of the verb “to bend” have separate entries, if they are present and annotated in the corpus). Systems that do offer lemmatisation should take care to submit the full forms for the evaluation, rather than only the lemmatised form.

We do acknowledge that true-casing, POS-tagging, and lemmatisation are all important problems and that being able to handle these issues is a great advantage for any ATE tool. Therefore, these issues can definitely be discussed and taking into account, but for the sake of transparency, the final f1-scores will be calculated on the lower-cased, unlemmatised gold standard.