从Prodigy的标有NER的JSONL格式转换为spaCy的训练格式?

时间:2020-05-21 16:00:05

标签: python sqlite nlp spacy ner

我是Prodigy和spaCy以及CLI编码的新手。我想使用Prodigy为NER模型标记数据,然后在python中使用spaCy创建模型。

SQLite格式的Prodigy输出。 SpaCy采用另一种格式,不确定该怎么称呼:

TRAIN_DATA = [
    (
        "Horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("Do they bite?", {"entities": []}),
    (
        "horses are too tall and they pretend to care about your feelings",
        {"entities": [(0, 6, LABEL)]},
    ),
    ("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}),
    (
        "they pretend to care about your feelings, those horses",
        {"entities": [(48, 54, LABEL)]},
    ),
    ("horses?", {"entities": [(0, 6, LABEL)]}),
]

如何从一种转换为另一种?看来这应该很容易,但是我在任何地方都找不到。

我在加载数据集时没有问题,只需转换即可。

1 个答案:

答案 0 :(得分:1)

Prodigy应该从1.9版开始使用data-to-spacy导出此训练格式:https://prodi.gy/docs/recipes#data-to-spacy