SemRep Gold Standard Annotation

In early 2011, we conducted a gold standard annotation study in which we annotated with semantic predications a set of 500 sentences randomly selected from MEDLINE abstracts. The results are mainly intended to serve as an evaluation testbed for SemRep. They can also be used by other information extraction systems based on UMLS domain knowledge. The study consisted of three phases: a) the practice phase, b) the main annotation phase, and c) the adjudication phase.

Here, we present two sets of annotations from the main phase as well as the adjudicated gold standard. For further details, please refer to our BMC Bioinformatics paper Constructing A Semantic Predication Gold Standard from the Biomedical Literature.

To access the SemRep Gold Standard Annotation files, users must have accepted the terms of the UMLS Metathesaurus License Agreement, which requires users to respect the copyrights of the constituent vocabularies and to file a brief annual report on their use of the UMLS. Users must also have activated a UMLS Terminology Services (UTS) account. For information on how to use UTS authentication, please click here.

For details of the licenses, please see the UMLS Metathesaurus License Agreement and How to License and Access the Unified Medical Language System (UMLS) Data.

Available Files:

Annotator A: Main Phase XML fileAnnotator A: Main Phase (main_A.xml) (1.3 mb)

Annotator B: Main Phase XML fileAnnotator B: Main Phase (main_B.xml) (1.4 mb)

Annotator C: Adjudication XML fileAnnotator C: Adjudication (adjudicated.xml) (1.4 mb)

DTD fileDTD file (annotations.dtd) (1.8 kb)