有没有办法在spaCy中使用根令牌检索整个名词块?

时间:2019-03-22 20:37:55

标签: python nlp spacy dependency-parsing

我对使用spaCy非常陌生。我已经阅读了数小时的文档,但是我仍然很困惑,是否有可能做我想做的事情。反正...

正如标题所述,有没有一种方法可以使用包含它的令牌实际获得给定的名词块。例如,给定句子:

"Autonomous cars shift insurance liability toward manufacturers"

当我只有"autonomous cars"令牌时,是否可以获得"cars"名词块?这是我要尝试的场景的示例片段。

startingSentence = "Autonomous cars and magic wands shift insurance liability toward manufacturers"
doc = nlp(startingSentence)
noun_chunks = doc.noun_chunks

for token in doc:
    if token.dep_ == "dobj":
        print(child) # this will print "liability"

        # Is it possible to do anything from here to actually get the "insurance liability" token?

任何帮助将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:2)

通过检查令牌是否在名词块跨度之一中,您可以轻松地找到包含已识别令牌的名词块:

doc = nlp("Autonomous cars and magic wands shift insurance liability toward manufacturers")
interesting_token = doc[7] # or however you identify the token you want
for noun_chunk in doc.noun_chunks:
    if interesting_token in noun_chunk:
        print(noun_chunk)

en_core_web_sm和spacy 2.0.18的输出不正确,因为shift未被识别为动词,因此您得到:

  

魔杖转移保险责任

使用en_core_web_md,这是正确的:

  

保险责任

(在文档中包含真正含糊的示例是有意义的,因为这是一个现实的场景(https://spacy.io/usage/linguistic-features#noun-chunks),但是如果新用户含糊不清,以至于跨版本/模型的分析不稳定,这会使他们感到困惑)