View CHEMDNER task of BioCreative IV (public)
























- Summary
CHEMDNER Task: Chemical compound and drug name recognition task
- License
- unknown
- Dependencies
- Tags
- abstracts Biocreative bionlp chemdner chemical compounds CRFs entity Learning machine Named NER PubMed recognition supervised
- Attribute Types
- Download
-
tar.gz (18.3 MB)
Files are converted on demand and the process can take up to a minute. Please wait until download begins.
- Original Data Format
- tar.gz
- Name
- Version mldata
- Comment
- Names
- Data (first 10 data points)
(Zipped) TAR archive CHEMDNER_CORPUS_BIOCREATIVE_4, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TRAIN, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TRAIN/chemdner_ann_training_13-07-31.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TRAIN/Readme.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TRAIN/chemdner_abs_training.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TRAIN/cdi_ann_training_13-07-31.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TRAIN/chemdner_data_preparation_v17july31.pdf, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TRAIN/cem_ann_training_13-07-31.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TEST, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TEST/chemdner_abs_test_pmid_label.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TEST/chemdner_abs_test.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TEST/chemdner_ann_test_13-09-13.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TEST/Readme.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TEST/cem_ann_test_13-09-13.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_TEST/cdi_ann_test_13-09-13.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/chemdner_corpus.pdf, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_DEVELOPMENT, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_DEVELOPMENT/chemdner_ann_development_13-08-18.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_DEVELOPMENT/Readme.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_DEVELOPMENT/chemdner_abs_development.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_DEVELOPMENT/cdi_ann_development_13-08-18.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/CHEMDNER_DEVELOPMENT/cem_ann_development_13-08-18.txt, CHEMDNER_CORPUS_BIOCREATIVE_4/chemdner_overview.pdf
- Description
There is an increasing interest, both on the academic side as well as for industry, to facilitate more efficient access to information on chemical compounds and drugs (chemical entities) described in repositories of unstructured data, including scientific articles, patents or health agency reports. In order to achieve this goal, a crucial aspect is to be able to identify mentions of chemical compounds automatically within text as well as to index whole documents with the compounds described in them. The recognition of chemical entities is also crucial for other subsequent text processing strategies, such as detection of drug-protein interactions, adverse effects of chemical compounds and their associations to toxicological endpoints or the extraction of pathway and metabolic reaction relations.
The CHEMDNER corpus provides annotations for:
a) Given a set of documents, return for each of them a ranked list of chemical entities described within each of these documents [Chemical document indexing sub-task] b) Provide for a given document the start and end indices corresponding to all the chemical entities mentioned in this document [Chemical entity mention recognition sub-task].
- URLs
- http://www.biocreative.org/tasks/biocreative-iv/chemdner/
- Publications
Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference (pp. 470--479). Aberdeen, Scotland: Morgan Kaufmann.
- Data Source
- http://www.biocreative.org/resources/corpora/bc-iv-chemdner-corpus/
- Measurement Details
Precision Recall F-measure
- Usage Scenario
named entity recognition bioinformatics chemoinformatics information retrieval information extraction relation extraction drug discovery adverse effect extraction detection of drug-protein interactions
- revision 1
- by krallinger on 2015-07-14 11:16
No one has posted any comments yet. Perhaps you would like to be the first?
Leave a comment
To post a comment, please sign in.This item was downloaded 2180 times and viewed 2 times.
Tasks defined on dataset CHEMDNER task of BioCreative IV
- GPRO patent protein recognition 2015-07-14 11:42
- CPD (chemical passage detection) 2015-07-14 11:36
- CEMP (chemical NER in patents) 2015-07-14 11:30
- CEM task BioCreative IV 2015-07-14 11:21
Disclaimer
We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.