Question

在SpaCy中，您可以为以下文档设置扩展名：

Doc.set_extension('chapter_id', default='')

doc = nlp('This is my text')
doc._.chapter_id = 'This is my ID'

但是，我有数千个应由NLP处理的文本文件。 SpaCy建议为此使用pipe：

docs = nlp.pipe(array_of_texts)

如何在pipe期间应用扩展名值？

Answer 1

您可能希望启用as_tuples上的nlp.pipe关键字参数，这使您可以传入(text, context)元组的列表，并产生(doc, context)元组。因此，您可以执行以下操作：

data = [('Some text', 1), ('Some other text', 2)]

def process_text(data):
    for doc, chapter_id in nlp.pipe(data, as_tuples=True):
        doc._.chapter_id = chapter_id
        yield doc

SpaCy，在管道中应用扩展

1 个答案: