Question

我是使用Python和NLP（使用SpaCy）的新手，所以我希望有人可以帮助我。我想从网元左右得到五个单词后，在文本中和右边立即检测命名实体。

我已经找到了NE，但是我一直在寻找“周围的单词”

import spacy

nlp=spacy.load("en_core_web_sm")

doc = nlp(open(path to my text).read())

for index, token in enumerate(doc.ents): 
    if token.label_ == "PERSON" and token.text == "Frodo" or token.text == "Frodo Beutlin":
        print(token[:index])
        print(token[index])
        print(token[index:])

Frodo Beutlin
think

这是我的结果，因为您可以看到未显示我的NE之前的字符串。我也很困惑如何获取多个字符串（前后）。

Answer 1

doc.ents中的实体的类型为Span。使用方括号，您仅可以索引范围内的标记。该实体具有字段start和end，可用于为原始文档中的令牌编制索引。

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("My name is Frodo Beutlin.")

entity = doc.ents[0]
print(f"Token on the left: '{doc[entity.start - 1]}'")
print(f"Token on the right: '{doc[entity.end]}'")

Token on the left: 'is'
Token on the right: '.'

Answer 2

感谢您的帮助！我确实对类型有误，现在可以使用了；）

for index, token in enumerate(doc.ents): 
    if token.label_ == "PERSON":
        if token.text == "Frodo Beutlin":
            span = doc.ents[index]
            for i in range(1,6):
                wordsBefore = doc[span.start - i]
            for i in range(1,6):
                wordsAfter = doc[span.start + i]

如何使用spacy从命名实体左右查找字符串

2 个答案: