我试图使用NLTK CFG Parser但得到了错误"语法没有涵盖一些输入词"。我使用的代码是:
import nltk
import codecs
strProductions = ''
f = codecs.open('C://nltk_data//corpora//CINTIL_TreeBank//producoes_S.txt', 'r',
encoding= 'latin-1')
for line in f:
strProductions= strProductions + line
f.close()
grammar = nltk.grammar.CFG.fromstring(strProductions)
cp = nltk.ChartParser(grammar)
print grammar
S -> V PNT
V -> 'Choveu'
NP -> DEM N
PP -> P NP
P -> 'de'
NP -> N_
N_ -> N A
N -> 'crian\\xe7a'
tokens = []
a = u'criança'
b = '.'
a= a.encode('latin-1')
for tree in cp.parse(tokens):
print tree
C:\Anaconda2\lib\site-packages\nltk\grammar.pyc in check_coverage(self, tokens)
629 missing = ', '.join('%r' % (w,) for w in missing)
630 raise ValueError("Grammar does not cover some of the "
--> 631 "input words: %r." % missing)
632
633 def _calculate_grammar_forms(self):
ValueError: Grammar does not cover some of the input words:
u"'crian\\xe7a'".
有人可以帮我确定发生了什么吗?
提前致谢