是否有任何可用的生物医学数据集已被注释? 我正在学习如何注释生物医学文本,尤其是为了消除歧义。但是我愿意看到用于其他目的的注释。
答案 0 :(得分:0)
给你一些语料库
| Entity | Corpus | Type | Size (sentences) |
|------------------|-----------------------------|------------|------------------|
| Gene and Protein | GENETAG [7] | Sentences | 20000 |
| | JNLPBA [6] (from GENIA [8]) | Abstracts | 22402 |
| | FSUPRGE [9] | Abstracts | ≈29447* |
| | PennBioIE [10] | Abstracts | ≈22877* |
| Species | OrganismTagger Corpus [11] | Full texts | 9863 |
| | Linnaeus Corpus [12] | Full texts | 19491 |
| Disorders | SCAI Disease [13] | Abstracts | ≈3640* |
| | EBI Disease [14] | Sentences | 600 |
| | Arizona Disease (AZDC) [15] | Sentences | 2500 |
| | BioText [16] | Abstracts | 3655 |
| Chemical | SCAI IUPAC [17] | Sentences | 20300 |
| | SCAI General [18] | Sentences | 914 |
| Anatomy | AnEM1 | Sentences | 4700 |
| Miscellaneous | CellFinder2 | Full texts | 2100 |