Question

我可以使用wordpad或文本文档中的注释在spaCy中训练NER，因为使用句子或段落进行训练不符合我的要求。感谢。

Answer 1

是的，可以。 python库spacy-annotator是您的朋友。
它使用ipywidgets为用户提供了一个用户友好的UI来注释数据。

第一：安装注释器。

pip install spacy-annotator

第二，将您的txt文档中的数据导入pandas数据框中。

import pandas as pd
df = pd.read_csv('insert_text_file.txt', sep=" ", header=None)

第三，使用spacy-annotator标记数据。

from spacy_annotator.pandas_annotations import annotate as pd_annotate

# Annotations
pd_dd = pd_annotate(df,
            col_text = 'full_text',     # Column in pandas dataframe containing text to be labelled
            labels = ['GPE', 'PERSON'], # List of labels you want to get from text
            sample_size=1,              # Size of the sample to be labelled
            delimiter=',',              # Delimiter to separate entities in UI
            model = None,               # spaCy model for noisy pre-labelling
            regex_flags=re.IGNORECASE   # One (or more) regex flags to be applied when searching for entities in text
            )

很棒的事情是spacy-annotator（i）以spacy喜欢的格式返回标签，（ii）与pandas orp python列表一起使用，并且（iii）允许用户进行嘈杂的预标注（即，如果您已经有一个spacy模型，您可以使用它来获取有关要注释的实体的建议。

如果您不想使用熊猫，则始终可以将文本文件的每一行导入python列表中，并使用spacy_annotator.list_annotations模块进行注释。它的工作方式与此类似。

我可以使用wordpad或文本文档中的注释在spaCy中训练NER

1 个答案: