对于形容词:
"The company's customer service was terrible."
{customer service, terrible}
动词:
"They kept increasing my phone bill"
{phone bill, increasing}
这是来自this posting
的分支问题但是我试图使用spacy查找与多令牌短语/复合名词(例如“客户服务”)相对应的adj和动词。
我不确定如何使用spacy,nltk或任何其他预包装的自然语言处理软件来做到这一点,我将不胜感激!
答案 0 :(得分:4)
对于像这样的简单示例,您可以将spaCy的dependency parsing与一些简单规则结合使用。
首先,要识别类似于给定示例的多词名词,可以使用“ compound”依赖项。在使用spaCy解析文档(例如句子)之后,请使用令牌的dep_
属性来查找其依赖项。
例如,这个句子有两个复合名词:
“复合依赖项标识复合名词。”
每个令牌及其依赖性如下所示:
import spacy
import pandas as pd
nlp = spacy.load('en')
example_doc = nlp("The compound dependency identifies compound nouns.")
for tok in example_doc:
print(tok.i, tok, "[", tok.dep_, "]")
>>>0 The [ det ]
>>>1 compound [ compound ]
>>>2 dependency [ nsubj ]
>>>3 identifies [ ROOT ]
>>>4 compound [ compound ]
>>>5 nouns [ dobj ]
>>>6 . [ punct ]
for tok in [tok for tok in example_doc if tok.dep_ == 'compound']: # Get list of
compounds in doc
noun = example_doc[tok.i: tok.head.i + 1]
print(noun)
>>>compound dependency
>>>compound nouns
以下功能适用于您的示例。但是,它可能不适用于更复杂的句子。
adj_doc = nlp("The company's customer service was terrible.")
verb_doc = nlp("They kept increasing my phone bill")
def get_compound_pairs(doc, verbose=False):
"""Return tuples of (multi-noun word, adjective or verb) for document."""
compounds = [tok for tok in doc if tok.dep_ == 'compound'] # Get list of compounds in doc
compounds = [c for c in compounds if c.i == 0 or doc[c.i - 1].dep_ != 'compound'] # Remove middle parts of compound nouns, but avoid index errors
tuple_list = []
if compounds:
for tok in compounds:
pair_item_1, pair_item_2 = (False, False) # initialize false variables
noun = doc[tok.i: tok.head.i + 1]
pair_item_1 = noun
# If noun is in the subject, we may be looking for adjective in predicate
# In simple cases, this would mean that the noun shares a head with the adjective
if noun.root.dep_ == 'nsubj':
adj_list = [r for r in noun.root.head.rights if r.pos_ == 'ADJ']
if adj_list:
pair_item_2 = adj_list[0]
if verbose == True: # For trying different dependency tree parsing rules
print("Noun: ", noun)
print("Noun root: ", noun.root)
print("Noun root head: ", noun.root.head)
print("Noun root head rights: ", [r for r in noun.root.head.rights if r.pos_ == 'ADJ'])
if noun.root.dep_ == 'dobj':
verb_ancestor_list = [a for a in noun.root.ancestors if a.pos_ == 'VERB']
if verb_ancestor_list:
pair_item_2 = verb_ancestor_list[0]
if verbose == True: # For trying different dependency tree parsing rules
print("Noun: ", noun)
print("Noun root: ", noun.root)
print("Noun root head: ", noun.root.head)
print("Noun root head verb ancestors: ", [a for a in noun.root.ancestors if a.pos_ == 'VERB'])
if pair_item_1 and pair_item_2:
tuple_list.append((pair_item_1, pair_item_2))
return tuple_list
get_compound_pairs(adj_doc)
>>>[(customer service, terrible)]
get_compound_pairs(verb_doc)
>>>[(phone bill, increasing)]
get_compound_pairs(example_doc, verbose=True)
>>>Noun: compound dependency
>>>Noun root: dependency
>>>Noun root head: identifies
>>>Noun root head rights: []
>>>Noun: compound nouns
>>>Noun root: nouns
>>>Noun root head: identifies
>>>Noun root head verb ancestors: [identifies]
>>>[(compound nouns, identifies)]