我有一个字符串列表,我将fit_transform写入CountVectorizer。
当我尝试TfidfTransform时,我收到错误:
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit(features_train)
from sklearn.feature_extraction.text import TfidfTransformer
transformer = TfidfTransformer()
X_train_tfidf = transformer.fit_transform(X_train_counts)
TypeError: no supported conversion for types: (dtype('O'),)
答案 0 :(得分:2)
您没有正确地向TfidfTransformer提供计数矩阵。
count_vect.fit(features_train)
不会返回计数矩阵。它返回self
,表示它将返回CountVectorizer类的拟合版本。
要返回计数矩阵,您需要调用transform()
方法。
纠正这样的代码:
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
# This changed
X_train_counts = count_vect.fit_transform(features_train)
from sklearn.feature_extraction.text import TfidfTransformer
transformer = TfidfTransformer()
X_train_tfidf = transformer.fit_transform(X_train_counts)
现在你不应该收到任何错误。
顺便说一句,我建议你不要再单独调用CountVectorizer然后再调用TfidfTransformer,而是建议你使用TfidfVectorizer
这只是这两者的组合,这会将你的代码减少到:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vect = TfidfVectorizer()
X_train_tfidf = transformer.fit_transform(features_train)