使用NLTK捕获具有小写介词的专有名词

时间:2014-03-19 00:51:56

标签: python nltk

我已经开始使用NLTK找到合适的名词了。但是,我很难找到其中包含小写介词的专有名词(人名和组织名称)。

例如,

The David Eccles School of Business at the University of Utah

变成(使用我的nltk POS标记器):

David Eccles School, Business, University

另一个例子:

The United Nations Economic and Social Council's Economic Commission for Africa

变为

United Nations Economic, Social Council, Economic Commission, Africa

有什么建议吗?

我正在考虑的一些事情(将所有介词和所有介词都大写)

    tokens2 = nltk.word_tokenize(x)
    tags = nltk.pos_tag(tokens2)
    res = nltk.ne_chunk(tags)
    tree = []
    for subtree in res.subtrees(filter=lambda t: t.node == 'PERSON'):
        subtree_l=[]
        for leaf in subtree.leaves():
            subtree_l.append(leaf[0])
        sub = ' '.join(subtree_l)
        tree.append(sub)

    for subtree in res.subtrees(filter=lambda t: t.node == 'ORGANIZATION'):
        subtree_l=[]
        for leaf in subtree.leaves():
            subtree_l.append(leaf[0])
        sub = ' '.join(subtree_l)
        tree.append(sub)
    x= ', '.join(tree)
    count = count+1
    print x

0 个答案:

没有答案