我对此比较陌生,所以我不确定这是否正确。我想根据我分析的另一份文件分析一份文件。
这是我到目前为止所拥有的。
import nltk
sent_one = "Here is a sentence I would like to break up into words"
sent_two = "This is new. How is the weather up there?"
tokens = nltk.word_tokenize(sent_one)
tokens_two = nltk.word.tokenize(sent_two)
new_set = nltk.pos_tag(tokens)
second_set = nltk.pos_tag(tokens_two)
t0 = nltk.DefaultTagger('NN')
t1 = nltk.UnigramTagger(new_set, backoff = t0)
t2 = nltk.BigramTagger(new_set, backoff = t1)
t2.evaluate(tokens_two)
这是我得到的错误 Traceback(最近一次调用最后一次): in t1 = nltk.UnigramTagger(new_set,backoff = t0) ValueError:解包的值太多(预期2)
我想用它来分析sent_two我明白这意味着UnigramTagger无法将sent_one识别为单词/词性列表,但我不知道如何纠正这一点。