Question

我有一个来自布朗语料库的带括号的树文件，看起来像这样：

(TOP (S (NP (NNP Ambassador)
            (NNP Stevenson))
        (NP (RB yesterday))
        (VP (VBD described)
            (NP (NP (DT the)
                    (NNP U.N.))
                (POS 's)
                (NN problem)
                (PP (IN of)
                    (S (NP (-NONE- *))
                       (VP (VBG electing)
                           (NP (DT a)
                               (ADJP (JJ temporary))
                               (NN successor)
                               (PP (TO to)
                                   (NP (DT the)
                                       (ADJP (JJ late))
                                       (NP (NNP Dag)
                                           (NNP Hammarskjold)))))))))
            (PP (IN as)
                (`` ``)
                (NP (NP (DT the)
                        (ADJP (JJS gravest))
                        (NN crisis))
                    (SBAR (-NONE- 0)
                          (S (NP (DT the)
                                 (NN institution))
                             (AUX (VBZ has))
                             (VP (VBN faced)

我必须从该文件中提取所有语法规则，格式为：

S -> NP VP
NP -> DT NNS
VP -> VBD NP
NP -> NN

等

我试图以nltk中的树的形式读取括号中的树，但出现以下错误：

File "C:\Users\MyName\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\tree.py", line 666, in fromstring cls._parse_error(s, match, 'end-of-string')

文件“ C：\ Users \ MyName \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ tree.py”，行734，在_parse_error中引发ValueError（msg） ValueError：Tree.read（）：预期为“字符串结尾”，但得到了“（TOP” 在索引24。 “ ... XT_UNIT）（TOP（S（N ...”

我想使用nltk的产品，但无法将此文件读取为nltk树。请帮忙！

从布朗语料库中提取语法规则

0 个答案: