我的数据集如下所示:
LABEL\tDESCRIPTION\n
CON\t\tsun rises in the east\n
OBS\t\tThere are 9 planets\n
TRT\t\tAsia is the largest continent\n
像这样的有几千行。 我想建立一个NER模型,其中实体是整个描述(“太阳从东方升起”),而CON ID分别是该标签。
如何将以上数据转换为spacy格式?
TRAIN_DATA = [('sun rises in the east', {'entities': [(0, 22, 'CON')]}),
('There are 9 planets', {'entities': [(0, 20, 'OBS')]}) .... ]