NLTK CFG Parser无法解析葡萄牙语中的单词

时间:2016-03-06 01:18:44

标签: python-2.7 nltk anaconda

我试图使用NLTK CFG Parser但得到了错误"语法没有涵盖一些输入词"。我使用的代码是:

import nltk
import codecs
strProductions = ''
f = codecs.open('C://nltk_data//corpora//CINTIL_TreeBank//producoes_S.txt', 'r', 
encoding= 'latin-1')
for line in f:    
    strProductions= strProductions + line
f.close()
grammar = nltk.grammar.CFG.fromstring(strProductions)
cp = nltk.ChartParser(grammar)
print grammar

S -> V PNT
V -> 'Choveu'    
NP -> DEM N
PP -> P NP
P -> 'de'
NP -> N_
N_ -> N A
N -> 'crian\\xe7a'

tokens = []    
a = u'criança'
b = '.'
a= a.encode('latin-1')
for tree in cp.parse(tokens):       
    print tree
C:\Anaconda2\lib\site-packages\nltk\grammar.pyc in check_coverage(self, tokens)
629             missing = ', '.join('%r' % (w,) for w in missing)
630             raise ValueError("Grammar does not cover some of the "
--> 631                              "input words: %r." % missing)
632 
633     def _calculate_grammar_forms(self):

ValueError: Grammar does not cover some of the input words:
u"'crian\\xe7a'".

有人可以帮我确定发生了什么吗?

提前致谢

0 个答案:

没有答案