Spacy

时间:2016-12-18 12:31:45

标签: python nlp nltk spacy

使用Python Spacy,我试图从多个主题被动语态句子中提取实体。

判刑=“约翰和珍妮被大卫指控犯罪”

我的目的是从句子中提取“John和Jenny” nsubjpass .ent _

但是,我只能将“John”提取为nsubjpass。

如何提取它们?

请注意,虽然John在.ents中被发现为实体,但Jenny被视为conj而不是nsubjpass。 如何改进?

each_sentence3 =  "John and Jenny were accused of crimes by David"
doc=nlp(each_sentence3)

passive_toks=[tok for tok in doc if (tok.dep_ == "nsubjpass") ]
if passive_toks != []:
    print(passive_toks)

结果:

[John]

实体列表显示:

`

print(list(doc.ents)

结果

[John, Jenny, David]

现在,如果我们检查整个句子,我们会看到如下:

代码:

for tok in doc:   
        print(tok, tok.dep_)

结果

John nsubjpass
and cc
Jenny conj
were auxpass
accused ROOT
of prep
crimes pobj
by agent
David pobj

请注意,第二个被动主体Jenny在Spacy中被识别为conj而不是nsubjpass。

1 个答案:

答案 0 :(得分:0)

以下是使用POS标记和依赖项解析来提取主题及其所有连词的示例。

还有一个Token.conjuncts属性,但它只能直接连接到令牌。见https://github.com/explosion/spaCy/issues/795

each_sentence3 = "John and Jenny were accused of crimes by David"
sent = nlp(each_sentence3)

result = []
subj = None
for word in sent:
    if 'subj' in word.dep_:
        subj = word
        result.append(word)
    elif word.dep_ == 'conj' and word.head == subj:
        result.append(word)
print str(result)


[John, Jenny]