SKlearn SGD Partial Fit错误:要素数量378与先前数据4598不匹配

时间:2017-05-17 19:38:06

标签: python machine-learning scikit-learn text-classification

我有pkl我的分类器并在另一个笔记本中打开并尝试对分类器进行partial_fit但收到错误功能数量378与之前的数据4598不匹配。

with open("models/count_vect_Item Group.pkl", 'r') as f:
 global count_vect_item_group
 count_vect_item_group = joblib.load(f)

with open("models/model_Item Group.pkl", 'r') as f:
 global model_predicted_item_group
 model_predicted_item_group = joblib.load(f)

count_matrix_X_train = count_vect_item_group.fit_transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)

model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test )

无法使用新数据集训练我的分类器。

1 个答案:

答案 0 :(得分:3)

这个错误是因为在你腌制你的分类器之前,你训练它有4598个特征(X中的列数),现在只有378个。它应该等于旧功能。

如何通过仅调用count_vect_item_group.transform()来实现此目的。您现在再次调用count_vect_item_group上的fit_transform(),然后忘记以前学过的数据,并适应新数据,因此找到的功能数量比以前少。

将您的代码更改为:

count_matrix_X_train = count_vect_item_group.transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)

model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test)