Question

我最近以Spacy闻名，并对这个Python库非常感兴趣。但是，在我的说明中，我打算从输入句子中提取复合名词-形容词对作为关键词。我认为Spacy提供了许多实用程序来处理NLP任务，但没有找到满足我所需任务的令人满意的线索。我在SO，related post中浏览了一篇非常相似的文章，解决方案不是很有效，并且不适用于自定义输入语句。

以下是一些输入句子：

sentence_1="My problem was with DELL Customer Service"
sentence_2="Obviously one of the most important features of any computer is the human interface."
sentence_3="The battery life seems to be very good and have had no issues with it."

这是我尝试过的代码：

import spacy, en_core_web_sm
nlp=en_core_web_sm.load()

def get_compound_nn_adj(doc):
    compounds_nn_pairs = []
    parsed=nlp(doc)
    compounds = [token for token in sent if token.dep_ == 'compound']
    compounds = [nc for nc in compounds if nc.i == 0 or sent[nc.i - 1].dep_ != 'compound']
    if compounds:
        for token in compounds:
            pair_1, pair_2 = (False, False)
            noun = sent[token.i:token.head.i + 1]
            pair_1 = noun
            if noun.root.dep_ == 'nsubj':
                adj_list = [rt for rt in noun.root.head.rights if rt.pos_ == 'ADJ']
                if adj_list:
                    pair_2 = adj_list[0]
            if noun.root.dep_ == 'dobj':
                verb_root = [vb for vb in noun.root.ancestors if vb.pos_ == 'VERB']
                if verb_root:
                    pair_2 = verb_root[0]
            if pair_1 and pair_2:
                compounds_nn_pairs.append(pair_1, pair_2)
    return compounds_nn_pairs

我正在推测应该对辅助函数进行什么样的修改，因为它没有返回我期望的复合名词-形容词对。是否有人对Spacy有良好的经验？如何改善上述草图解决方案？有更好的主意吗？

所需的输出：

我希望从每个输入句子中获得复合名词-形容词对，如下所示：

desired_output_1="DELL Customer Service"
desired_output_2="human interface"
desired_output_3="battery life"

有什么办法可以得到预期的输出？上述实现将需要什么样的更新？还有其他想法吗？预先感谢！

Answer 1

似乎spaCy仅检测句子1和3中的复合关系，并将2视为amod关系。（下面是一些用于检查其解析的快速代码：[(i, i.pos_, i.dep_) for i in nlp(sentence_1)]）。

要从1和3中提取化合物，请尝试以下操作：

for i in nlp(sentence_1):
    if i.pos_ in ["NOUN", "PROPN"]:
        comps = [j for j in i.children if j.dep_ == "compound"]
        if comps:
            print(comps, i)

对于句子中的每个名词或专有名词，它会检查其子树中是否存在compound关系。

要投放一个也可以吸收形容词的更广泛的网络，您可以在单词的子树中查找形容词和名词，而不仅仅是化合物：

for i in nlp(sentence_2):
    if i.pos_ in ["NOUN", "PROPN"]:
        comps = [j for j in i.children if j.pos_ in ["ADJ", "NOUN", "PROPN"]]
        if comps:
            print(comps, i)

Answer 2

我怀疑这必须由复合名词数据库来处理。 “复合名词”的地位来自用法的普遍性。因此，也许各种n-gram数据库（例如Google的数据库）都可以作为来源。

使用Spacy从句子中找到复合形容词对的优雅解决方案吗？

2 个答案: