我想识别一组句子的主语和宾语。我的实际工作是从一组评审数据中确定因果关系。
我正在使用Spacy Package来分块和解析数据。但实际上没有实现我的目标。有没有办法这样做?
E.g:
I thought it was the complete set
出:
subject object
I complete set
答案 0 :(得分:7)
以最简单的方式。 token.dep_访问依赖项 导入spacy:
import spacy
nlp = spacy.load('en')
parsed_text = nlp(u"I thought it was the complete set")
#get token dependencies
for text in parsed_text:
#subject would be
if text.dep_ == "nsubj":
subject = text.orth_
#iobj for indirect object
if text.dep_ == "iobj":
indirect_object = text.orth_
#dobj for direct object
if text.dep_ == "dobj":
direct_object = text.orth_
print(subject)
print(direct_object)
print(indirect_object)
答案 1 :(得分:0)
您可以使用名词块。
doc = nlp("I thought it was the complete set")
for nc in doc.noun_chunks:
print(nc.text)
I
it
the complete set
仅选择"我"而不是两个"我"和"它",你可以先写一个测试来取得ROOT左边的nsubj。
答案 2 :(得分:0)
Stanza 使用高度准确的神经网络组件构建而成,还可以使用您自己的带注释的数据进行高效的训练和评估。这些模块建立在 PyTorch 库之上。
Stanza 是一个 Python 自然语言分析包。它包含可在管道中使用的工具,将包含人类语言文本的字符串转换为句子和单词列表,生成这些单词的基本形式、词性和形态特征,以提供句法结构依赖解析, 并识别命名实体。
def find_Subject_Object(text):
# import required packages
import stanza
nlp = stanza.Pipeline(lang='en', processors='tokenize,mwt,pos,lemma,depparse')
doc = nlp(text)
clausal_subject = []
nominal_subject = []
indirect_object = []
Object = []
for sent in doc.sentences:
for word in sent.words:
if word.deprel == "nsubj":
nominal_subject.append({word.text:"nominal_subject nsubj"})
elif word.deprel == "csubj":
clausal_subject.append({word.text:"clausal_subject csubj"})
elif word.deprel == "iobj":
indirect_object.append({word.text:"indirect_object iobj"})
elif word.deprel == "obj":
Object.append({word.text:"object obj"})
return indirect_object, Object, clausal_subject,nominal_subject
text ="""John F. Kennedy International Airport is an international airport in Queens, New York, USA, and one of the primary airports serving New York City."""
find_Subject_Object(text)
# output #
([], [{'City': 'object obj'}], [], [{'John': 'nominal_subject nsubj'}, {'Airport': 'nominal_subject nsubj'}])
Stanza 包含一个到 CoreNLP Java 包的 Python 接口,并从那里继承了附加功能,例如选区解析、共指解析和语言模式匹配。
总而言之,Stanza 的特点: