使用Python Spacy,我试图从多个主题被动语态句子中提取实体。
判刑=“约翰和珍妮被大卫指控犯罪”
我的目的是从句子中提取“John和Jenny” nsubjpass 和 .ent _ 。
但是,我只能将“John”提取为nsubjpass。
如何提取它们?
请注意,虽然John在.ents中被发现为实体,但Jenny被视为conj而不是nsubjpass。 如何改进?
each_sentence3 = "John and Jenny were accused of crimes by David"
doc=nlp(each_sentence3)
passive_toks=[tok for tok in doc if (tok.dep_ == "nsubjpass") ]
if passive_toks != []:
print(passive_toks)
[John]
实体列表显示:
`
print(list(doc.ents)
[John, Jenny, David]
现在,如果我们检查整个句子,我们会看到如下:
for tok in doc:
print(tok, tok.dep_)
John nsubjpass
and cc
Jenny conj
were auxpass
accused ROOT
of prep
crimes pobj
by agent
David pobj
请注意,第二个被动主体Jenny在Spacy中被识别为conj而不是nsubjpass。
答案 0 :(得分:0)
以下是使用POS标记和依赖项解析来提取主题及其所有连词的示例。
还有一个Token.conjuncts属性,但它只能直接连接到令牌。见https://github.com/explosion/spaCy/issues/795
each_sentence3 = "John and Jenny were accused of crimes by David"
sent = nlp(each_sentence3)
result = []
subj = None
for word in sent:
if 'subj' in word.dep_:
subj = word
result.append(word)
elif word.dep_ == 'conj' and word.head == subj:
result.append(word)
print str(result)
[John, Jenny]