Por favor, use este identificador para citar o enlazar a este item:
http://hdl.handle.net/10261/240680
COMPARTIR / EXPORTAR:
SHARE BASE | |
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL | DATACITE | |
Título: | A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine |
Autor: | Campillos-Llanos, Leonardo CSIC ORCID ; Valverde Mateos, Ana; Capllonch Carrión, Adrián; Moreno Sandoval, Antonio | Palabras clave: | Clinical trials Evidence-Based Medicine Semantic Annotation Inter-Annotator Agreement Natural Language Processing |
Fecha de publicación: | 2021 | Editor: | BioMed Central | Citación: | BMC Medical Informatics and Decision Making 21: 69 (2021) | Resumen: | [Background] The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus. [Methods] We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models. [Results] This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure. [Conclusions] Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html. The methods are generalizable to other languages with similar available sources. |
Descripción: | Este artículo está sujeto a una licencia CC BY 4.0 | Versión del editor: | https://doi.org/10.1186/s12911-021-01395-z | URI: | http://hdl.handle.net/10261/240680 | ISSN: | 1472-6947 | Referencias: | Campillos-Llanos, Leonardo; Valverde-Mateos, Ana; Capllonch-Carrión, Adrián; Moreno-Sandoval, Antonio; 2021; CT-EBM-SP - Corpus of Clinical Trials for Evidence-Based-Medicine in Spanish [Dataset]; In BMC Medical Informatics and Decision Making; Version 1; Vol. 21; Number article 69; https://doi.org/10.1186/s12911-021-01395-z; http://hdl.handle.net/10261/285045 |
Aparece en las colecciones: | (CCHS-ILLA) Artículos |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
A clinical trials corpus.pdf | 3,04 MB | Adobe PDF | Visualizar/Abrir |
CORE Recommender
Page view(s)
146
checked on 25-may-2024
Download(s)
141
checked on 25-may-2024
Google ScholarTM
Check
Este item está licenciado bajo una Licencia Creative Commons