如何将标记与Spacy中的句子链接

时间:2018-03-17 13:14:09

标签: spacy

我想从令牌中构建一个关键字列表,并查找它们来自的句子,谢谢

1 个答案:

答案 0 :(得分:2)

您可以从token.doc.sents获取句子,然后找到在您的令牌之后或之后开始的第一个句子。您可以通过向token添加扩展属性来更方便地使用此功能:

import spacy
from spacy.tokens import Token

def get_sentence(token):
    for sent in token.doc.sents:
        if sent.start <= token.i:
            return sent

# Add a computed property, which will be accessible as token._.sent
Token.set_extension('sent', getter=get_sentence)

nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Sentence one. Sentence two.')
print(list(doc.sents))
print(doc[0]._.sent)
print(doc[-1]._.sent)