从布朗语料库中提取语法规则

时间:2019-03-28 20:24:52

标签: python nlp nltk

我有一个来自布朗语料库的带括号的树文件,看起来像这样:

(TOP (S (NP (NNP Ambassador)
            (NNP Stevenson))
        (NP (RB yesterday))
        (VP (VBD described)
            (NP (NP (DT the)
                    (NNP U.N.))
                (POS 's)
                (NN problem)
                (PP (IN of)
                    (S (NP (-NONE- *))
                       (VP (VBG electing)
                           (NP (DT a)
                               (ADJP (JJ temporary))
                               (NN successor)
                               (PP (TO to)
                                   (NP (DT the)
                                       (ADJP (JJ late))
                                       (NP (NNP Dag)
                                           (NNP Hammarskjold)))))))))
            (PP (IN as)
                (`` ``)
                (NP (NP (DT the)
                        (ADJP (JJS gravest))
                        (NN crisis))
                    (SBAR (-NONE- 0)
                          (S (NP (DT the)
                                 (NN institution))
                             (AUX (VBZ has))
                             (VP (VBN faced)

我必须从该文件中提取所有语法规则,格式为:

S -> NP VP
NP -> DT NNS
VP -> VBD NP
NP -> NN

我试图以nltk中的树的形式读取括号中的树,但出现以下错误:

File "C:\Users\MyName\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\tree.py", line 666, in fromstring cls._parse_error(s, match, 'end-of-string')

文件“ C:\ Users \ MyName \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ site-packages \ nltk \ tree.py”,行734,在_parse_error中 引发ValueError(msg) ValueError:Tree.read():预期为“字符串结尾”,但得到了“(TOP” 在索引24。 “ ... XT_UNIT)(TOP(S(N ...”

我想使用nltk的产品,但无法将此文件读取为nltk树。请帮忙!

0 个答案:

没有答案