Question

我正在尝试运行一个Python代码，主要是基于NLTK书，用于ngram POS标记来自我的GujaratiTextCorpus的Gujarati语言文本。我遇到了ValueError。

我正在Windows 10中使用Python 3.7.3。我通过anaconda使用jupyter笔记本。我是使用python的初学者。我研究了关于stackoverflow的答案。 com修复我的ValueError，但无法解决。

import nltk
f = open('C:\\Users\\BHOGAYATA\\Documents\\GujaratiPosTagging\\cts260.txt', encoding = 'utf8')
raw = f.read()
train2_sents = nltk.sent_tokenize(raw)
text2 = nltk.Text(train2_sents)
train2_sents
import nltk
f = open('C:\\Users\\BHOGAYATA\\Documents\\GujaratiPosTagging\\txt42_sents.txt', encoding = 'utf8')
raw = f.read()
bs_sents = nltk.sent_tokenize(raw)
text3 = nltk.Text(bs_sents)
bs_sents
unigram_tagger = nltk.UnigramTagger(train2_sents)
unigram_tagger.tag(bs_sents)

我希望两个古吉拉特语句子的单词都用POS标签。我发现以下错误消息：

ValueError                                
Traceback (most recent call last)
<ipython-input-3-5fae0b92393e> in <module>
     11 text3 = nltk.Text(bs_sents)
     12 bs_sents
---> 13 unigram_tagger = nltk.UnigramTagger(train2_sents)
     14 unigram_tagger.tag(bs_sents)
     15 

~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in __init__(self, train, model, backoff, cutoff, verbose)
    344 
    345     def __init__(self, train=None, model=None, backoff=None, cutoff=0, verbose=False):
--> 346         NgramTagger.__init__(self, 1, train, model, backoff, cutoff, verbose)
    347 
    348     def encode_json_obj(self):

~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in __init__(self, n, train, model, backoff, cutoff, verbose)
    293 
    294         if train:
--> 295             self._train(train, cutoff, verbose)
    296 
    297     def encode_json_obj(self):

~\Anaconda3\lib\site-packages\nltk\tag\sequential.py in _train(self, tagged_corpus, cutoff, verbose)
    181         fd = ConditionalFreqDist()
    182         for sentence in tagged_corpus:
--> 183             tokens, tags = zip(*sentence)
    184             for index, (token, tag) in enumerate(sentence):
    185                 # Record the event.

ValueError: not enough values to unpack (expected 2, got 1)

Answer 1

它表示您传递的变量有一个输出，但是您希望有两个。.

例如：

for a, b in [("a", "b")]:
    print("a:", a, "b:", b)

This will work 

for a, b in [("a")]:
    print("a:", a, "b:", b)

This will not work

编辑：

看看你的UnigramTagger 对于第一个参数，它需要一个

类型的带标记句子的列表

  list(list(tuple(str, str)))

您要提供

类型的train2_sents

  list(tuple(str,str)

您的 list(tuple(str,str)与train2_sents

如何修复此ValueError？

1 个答案: