可用的生物医学注释数据集

时间:2019-07-05 05:13:17

标签: nlp annotations

是否有任何可用的生物医学数据集已被注释? 我正在学习如何注释生物医学文本,尤其是为了消除歧义。但是我愿意看到用于其他目的的注释。

1 个答案:

答案 0 :(得分:0)

给你一些语料库

| Entity           | Corpus                      | Type       | Size (sentences) |
|------------------|-----------------------------|------------|------------------|
| Gene and Protein | GENETAG [7]                 | Sentences  | 20000            |
|                  | JNLPBA [6] (from GENIA [8]) | Abstracts  | 22402            |
|                  | FSUPRGE [9]                 | Abstracts  | ≈29447*          |
|                  | PennBioIE [10]              | Abstracts  | ≈22877*          |
| Species          | OrganismTagger Corpus [11]  | Full texts | 9863             |
|                  | Linnaeus Corpus [12]        | Full texts | 19491            |
| Disorders        | SCAI Disease [13]           | Abstracts  | ≈3640*           |
|                  | EBI Disease [14]            | Sentences  | 600              |
|                  | Arizona Disease (AZDC) [15] | Sentences  | 2500             |
|                  | BioText [16]                | Abstracts  | 3655             |
| Chemical         | SCAI IUPAC [17]             | Sentences  | 20300            |
|                  | SCAI General [18]           | Sentences  | 914              |
| Anatomy          | AnEM1                       | Sentences  | 4700             |
| Miscellaneous    | CellFinder2                 | Full texts | 2100             |

source