Question

df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(x))

我在数据框的“消息”列上尝试了此操作，但收到错误消息：

TypeError: decoding to str: need a bytes-like object, list found

Answer 1

显然，df_clean["message"]列包含单词列表，而不是字符串，因此错误提示为need a bytes-like object, list found。

要解决此问题，您需要使用join()这样的方法再次将其转换为字符串：

df_clean['message'] = df_clean['message'].apply(lambda x: gensim.parsing.preprocessing.remove_stopwords(" ".join(x)))

请注意，df_clean["message"]在应用前面的代码后将包含字符串对象。

Answer 2

这不是gensim问题，pandas引发了错误：您的列message中有一个值为list而不是{{ 1}}。这是一个最小的string示例：

pandas

Answer 3

错误是指 remove_stopwords 需要 string 类型的对象，并且您要传递列表，因此在删除 stop之前单词检查列中的所有值都是字符串类型。 See the Docs