使用Stanford CoreNLP Python Parser进行特定输出

时间:2016-07-25 14:52:07

标签: python nlp pos-tagger stanford-nlp

我使用SCP来获取英语句子的解析CFG树。

(S (NP (DET Every) (NN cat)) (VP (VT loves) (NP (DET a) (NN dog))))

我的预期输出是这样的树:

(ROOT (S (NP (DT Every) (NN cat)) (VP (VBZ loves) (NP (DT a) (NN dog)))))

但我得到的是:

<div class="service-box">

<p class="box-title">Social Media</p>

<div class="service-overbox">

<h2 class="title">Social Media</h2>

<p class="tagline">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer ac sodales lorem. Donec condimentum feugiat feugiat. Vestibulum blandit dolor risus, eget fringilla sem suscipit vel. In id ex ut nulla mollis suscipit nec in velit. Fusce auctor dapibus elit. Nam in justo sapien.</p>

</div>

</div>

如何按预期更改POS标记并删除ROOT节点?

由于

1 个答案:

答案 0 :(得分:1)

您可以使用nltk.tree中的NLTK模块。

from nltk.tree import *

def traverse(t):
    try:
        # Replace Labels
        if t.label() == "DT":
            t.set_label("DET")
        elif t.label() == "VBZ":
            t.set_label("VT")   
    except AttributeError:
        return

    for child in t:
        traverse(child)

output_tree= "(ROOT (S (NP (DT Every) (NN cat)) (VP (VBZ loves) (NP (DT a) (NN dog)))))"
tree = ParentedTree.fromstring(output_tree)

# Remove ROOT Element
if tree.label() == "ROOT":  
    tree = tree[0]

traverse(tree)
print tree  
# (S (NP (DET Every) (NN cat)) (VP (VT loves) (NP (DET a) (NN dog))))