我正在尝试运行一个Python代码,主要是基于NLTK书,用于ngram POS标记来自我的GujaratiTextCorpus的Gujarati语言文本。我遇到了ValueError。
我正在Windows 10中使用Python 3.7.3。我通过anaconda使用jupyter笔记本。我是使用python的初学者。我研究了关于stackoverflow的答案。 com修复我的ValueError,但无法解决。
import nltk
f = open('C:\\Users\\BHOGAYATA\\Documents\\GujaratiPosTagging\\cts260.txt', encoding = 'utf8')
raw = f.read()
train2_sents = nltk.sent_tokenize(raw)
text2 = nltk.Text(train2_sents)
train2_sents
import nltk
f = open('C:\\Users\\BHOGAYATA\\Documents\\GujaratiPosTagging\\txt42_sents.txt', encoding = 'utf8')
raw = f.read()
bs_sents = nltk.sent_tokenize(raw)
text3 = nltk.Text(bs_sents)
bs_sents
unigram_tagger = nltk.UnigramTagger(train2_sents)
unigram_tagger.tag(bs_sents)
我希望两个古吉拉特语句子的单词都用POS标签。我发现以下错误消息:
ValueError
Traceback (most recent call last)
<ipython-input-3-5fae0b92393e> in <module>
11 text3 = nltk.Text(bs_sents)
12 bs_sents
---> 13 unigram_tagger = nltk.UnigramTagger(train2_sents)
14 unigram_tagger.tag(bs_sents)
15
~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in __init__(self, train, model, backoff, cutoff, verbose)
344
345 def __init__(self, train=None, model=None, backoff=None, cutoff=0, verbose=False):
--> 346 NgramTagger.__init__(self, 1, train, model, backoff, cutoff, verbose)
347
348 def encode_json_obj(self):
~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in __init__(self, n, train, model, backoff, cutoff, verbose)
293
294 if train:
--> 295 self._train(train, cutoff, verbose)
296
297 def encode_json_obj(self):
~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in _train(self, tagged_corpus, cutoff, verbose)
181 fd = ConditionalFreqDist()
182 for sentence in tagged_corpus:
--> 183 tokens, tags = zip(*sentence)
184 for index, (token, tag) in enumerate(sentence):
185 # Record the event.
ValueError: not enough values to unpack (expected 2, got 1)
答案 0 :(得分:0)
它表示您传递的变量有一个输出,但是您希望有两个。.
例如:
for a, b in [("a", "b")]:
print("a:", a, "b:", b)
This will work
for a, b in [("a")]:
print("a:", a, "b:", b)
This will not work
编辑:
看看你的UnigramTagger 对于第一个参数,它需要一个
类型的带标记句子的列表 list(list(tuple(str, str)))
您要提供
类型的train2_sents list(tuple(str,str)
您的
list(tuple(str,str)
与train2_sents