我有一个来自布朗语料库的带括号的树文件,看起来像这样:
(TOP (S (NP (NNP Ambassador)
(NNP Stevenson))
(NP (RB yesterday))
(VP (VBD described)
(NP (NP (DT the)
(NNP U.N.))
(POS 's)
(NN problem)
(PP (IN of)
(S (NP (-NONE- *))
(VP (VBG electing)
(NP (DT a)
(ADJP (JJ temporary))
(NN successor)
(PP (TO to)
(NP (DT the)
(ADJP (JJ late))
(NP (NNP Dag)
(NNP Hammarskjold)))))))))
(PP (IN as)
(`` ``)
(NP (NP (DT the)
(ADJP (JJS gravest))
(NN crisis))
(SBAR (-NONE- 0)
(S (NP (DT the)
(NN institution))
(AUX (VBZ has))
(VP (VBN faced)
我必须从该文件中提取所有语法规则,格式为:
S -> NP VP
NP -> DT NNS
VP -> VBD NP
NP -> NN
等
我试图以nltk中的树的形式读取括号中的树,但出现以下错误:
File "C:\Users\MyName\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\tree.py", line 666, in fromstring cls._parse_error(s, match, 'end-of-string')
文件“ C:\ Users \ MyName \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ tree.py”,行734,在_parse_error中 引发ValueError(msg) ValueError:Tree.read():预期为“字符串结尾”,但得到了“(TOP” 在索引24。 “ ... XT_UNIT)(TOP(S(N ...”
我想使用nltk的产品,但无法将此文件读取为nltk树。请帮忙!