如何使用Spacy解析动词

时间:2018-03-14 07:17:42

标签: python dictionary spacy linguistics

我试图解析语料库中的动词并将其列在词典中,并计算每个动词出现多少次传递,不及物动词和双传递。我想知道如何使用spacy来解析动词,并将它们标记为可传递,不及物动词和双传递。

1 个答案:

答案 0 :(得分:1)

在这里,我总结了Mirith/Verb-categorizer的代码。基本上,你可以遍历VERB令牌并查看他们的孩子,将他们分类为传递,不及物或者传递。一个例子如下。

首先,导入spacy

import spacy
nlp = spacy.load('en')

假设你有一个令牌的例子,

tokens = nlp('I like this dog. It is pretty good. I saw a bird. We arrived at the classroom door with only seven seconds to spare.')

您可以根据需要创建以下函数将VERB转换为新类型:

def check_verb(token):
    """Check verb type given spacy token"""
    if token.pos_ == 'VERB':
        indirect_object = False
        direct_object = False
        for item in token.children:
            if(item.dep_ == "iobj" or item.dep_ == "pobj"):
                indirect_object = True
            if (item.dep_ == "dobj" or item.dep_ == "dative"):
                direct_object = True
        if indirect_object and direct_object:
            return 'DITRANVERB'
        elif direct_object and not indirect_object:
            return 'TRANVERB'
        elif not direct_object and not indirect_object:
            return 'INTRANVERB'
        else:
            return 'VERB'
    else:
        return token.pos_

示例

[check_verb(t) for t in tokens] # ['PRON', 'TRAN', 'DET', 'NOUN', 'PUNCT', ...]