我想从令牌中构建一个关键字列表,并查找它们来自的句子,谢谢
答案 0 :(得分:2)
您可以从token.doc.sents
获取句子,然后找到在您的令牌之后或之后开始的第一个句子。您可以通过向token
添加扩展属性来更方便地使用此功能:
import spacy
from spacy.tokens import Token
def get_sentence(token):
for sent in token.doc.sents:
if sent.start <= token.i:
return sent
# Add a computed property, which will be accessible as token._.sent
Token.set_extension('sent', getter=get_sentence)
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'Sentence one. Sentence two.')
print(list(doc.sents))
print(doc[0]._.sent)
print(doc[-1]._.sent)