Question

我查找了所有建议，每个人都说要通过拆分函数将字符串分解为标记。所有这些已经完成，但似乎仍然一次又一次地出现同样的错误。

for r in words:
        if not r in stop_words:
            processed_txt+=str(str(ps.stem(r) + " "))
    tokenizer = RegexpTokenizer(r'\w+')
    tokens = tokenizer.tokenize(processed_txt)
    #print(tokens)
    dictionary = corpora.Dictionary(tokens)
    #corpus = [dictionary.doc2bow(text) for text in tokens]
    print(dictionary)

所以现在它给出了以下错误。

raise TypeError("doc2bow expects an array of unicode tokens on input, not a 
single string")
TypeError: doc2bow expects an array of unicode tokens on input, not a single 
string

和＆＃34;令牌下的输出＆＃34;变量如下所示。

['becom', 'effect', 'willingli', 'without', 'need', 'obtain', 'knowledg', 'other', 'obtain', 'acquir', 'must', 'testamentari','claim', 'ownership', 'task', 'establish', 'endow', 'recept', 'willing', 'willsend', 'anoth', 'given', 'efficaci', 'presuppos']

请帮忙。

TypeError：doc2bow期望输入的unicode标记数组，而不是单个字符串

0 个答案: