ValueError:可迭代原始文本文档,接收字符串对象

时间:2019-05-21 11:45:16

标签: python-3.x text tf-idf countvectorizer

尝试使用TFIDFcount vectorizer来确定广告的权重值

当我为每一行分别执行以下代码时,下面的代码工作正常。添加循环或使用函数会引发错误。

function and tried using a lambda function 
def t_keywo(text):
    tf_idf_vector=tfidf_transformer.transform(cv.transform(text))
    #sort the tf-idf vectors by descending order of scores
    sorted_items=sort_coo(tf_idf_vector.tocoo())
    keywords=extract_topn_from_vector(feature_names,sorted_items)

    return keywords

for loop 

for i in range(len(df_cs_l)):
    tf_idf_vector=tfidf_transformer.transform(cv.transform(df_cs_l[i]))
    #sort the tf-idf vectors by descending order of scores
    sorted_items=sort_coo(tf_idf_vector.tocoo())
    keywords=extract_topn_from_vector(feature_names,sorted_items)
    ref={'Text':i,'words': keywords}
    rel.append(ref)


当我执行上面的代码时,它使我跌破错误

Error: "ValueError: Iterable over raw text documents expected, string object received."

在下面的链接中看到了相同的错误

clicl here to view the example

1 个答案:

答案 0 :(得分:0)

在功能和工作方式上进行了更改。

def t_keywo(text,cv,tfidf_transformer,tf_idf_vector):
    tf_idf_vector=tfidf_transformer.transform(cv.transform([text]))
    #sort the tf-idf vectors by descending order of scores
    sorted_items=sort_coo(tf_idf_vector.tocoo())
    keywords=extract_topn_from_vector(feature_names,sorted_items)

    return keywords

df_cs['keywords'] = df_cs['text'].apply(lambda x:t_keywo(x,cv,tfidf_transformer,tf_idf_vector))