我的树语料库如下
display
我需要解析这棵树并转换为如下的句子形式
(TOP END_OF_TEXT_UNIT)
(TOP (S (NP (DT The)
(NNP Fulton)
(NNP County)
(NNP Grand)
(NNP Jury))
(VP (VBD said)
(NP (NNP Friday))
(SBAR (-NONE- 0)
(S (NP (DT an)
(NN investigation)
(PP (IN of)
(NP (NP (NNP Atlanta))
(POS 's)
(JJ recent)
(JJ primary)
(NN election))))
(VP (VBD produced)
(NP (`` ``)
(DT no)
(NN evidence)
('' '')
(SBAR (IN that)
(S (NP (DT any)
(NNS irregularities))
(VP (VBD took)
(NP (NN place)))))))))))
(. .))
是否有任何算法可以解析上述内容,或者我们需要使用正则表达式来做到这一点,所以我不想使用NLTK软件包来做到这一点。
答案 0 :(得分:1)
Pyparsing使嵌套表达式解析快速进行。
ToCharArray()
打印:
import pyparsing as pp
LPAR, RPAR = map(pp.Suppress, "()")
expr = pp.Forward()
label = pp.Word(pp.alphas.upper()+'-') | "''" | "``" | "."
word = pp.Literal(".") | "''" | "``" | pp.Word(pp.printables, excludeChars="()")
expr <<= LPAR + label + (word | pp.OneOrMore(expr)) + RPAR
sample = """
(TOP (S (NP (DT The)
(NNP Fulton)
(NNP County)
(NNP Grand)
(NNP Jury))
(VP (VBD said)
(NP (NNP Friday))
(SBAR (-NONE- 0)
(S (NP (DT an)
(NN investigation)
(PP (IN of)
(NP (NP (NNP Atlanta))
(POS 's)
(JJ recent)
(JJ primary)
(NN election))))
(VP (VBD produced)
(NP (`` ``)
(DT no)
(NN evidence)
('' '')
(SBAR (IN that)
(S (NP (DT any)
(NNS irregularities))
(VP (VBD took)
(NP (NN place)))))))))))
(. .))
"""
result = pp.OneOrMore(expr).parseString(sample)
print(' '.join(result))
通常,这样的解析器将使用TOP S NP DT The NNP Fulton NNP County NNP Grand NNP Jury VP VBD said NP NNP Friday SBAR -NONE- 0 S NP DT an NN investigation PP IN of NP NP NNP Atlanta POS 's JJ recent JJ primary NN election VP VBD produced NP `` `` DT no NN evidence '' '' SBAR IN that S NP DT any NNS irregularities VP VBD took NP NN place . .
来保留嵌套元素的分组。但是在您的情况下,由于无论如何您最终还是想要一个平面列表,因此我们将其保留在外-pyparsing的默认行为是仅返回匹配字符串的平面列表。