Question

我想简化我的解析树＆＃39;节点，即给定节点我摆脱了第一个连字符和连字符之后的任何内容。例如，如果一个节点是NP-TMP-FG，我想让它成为NP，如果它是SBAR-SBJ，我想把它做成SBAR等等。这是我有一个解析树的例子

( (S (S-TPC-2 (NP-SBJ (NP (DT The) (NN asbestos) (NN fiber) ) (, ,)
(NP (NN crocidolite) ) (, ,) ) (VP (VBZ is) (ADJP-PRD (RB unusually) (JJ resilient) )
(SBAR-TMP (IN once) (S (NP-SBJ (PRP it) ) (VP (VBZ enters) (NP (DT the) (NNS lungs) ))))
(, ,) (PP (IN with)(S-NOM (NP-SBJ (NP (RB even) (JJ brief) (NNS exposures) ) (PP (TO to)
(NP (PRP it) ))) (VP (VBG causing) (NP (NP (NNS symptoms) ) (SBAR (WHNP-1 (WDT that) )
(S (NP-SBJ (-NONE- *T*-1) ) (VP (VBP show) (PRT (RP up) ) (ADVP-TMP (NP (NNS decades) ) 
(JJ later) )))))))))) (, ,) (NP-SBJ (NNS researchers) ) (VP (VBD said)(SBAR (-NONE- 0) 
(S (-NONE- *T*-2) )))    (. .) ))

这是我的代码，但它不起作用。

import re
import nltk
from nltk.tree import *
tree = Tree.fromstring(line) // Each parse tree is stored in one single line
for subtree in tree.subtrees():
    re.sub('-.*', '', subtree.label())
print tree

编辑：

我猜问题是subtree.label（）显示了节点，但由于它是一个函数，因此无法更改。 print subtree.label（）的输出是：

S
S-TPC-2
NP-SBJ
NP
DT
NN
,

依旧......

Answer 1

你可以这样做：

for subtree in tree.subtrees():
    first = subtree.label().split('-')[0]
    subtree.set_label(first)

Answer 2

我想出了这个：

for subtree in tree.subtrees():
    s = subtree.label()
    subtree.set_label(re.sub('-.*', "", s))

如何遍历树的所有节点？

2 个答案: