我是机器学习的新手。我正在使用SGDClassifier对我的文档进行分类。我训练了模型。为了保持训练有素的数据,我使用了泡菜
classify.py中的代码用于训练模型
corpus=df2.title_desc #df2 is my dataframe with 2 columns title_desc and category
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix=vectorizer.fit_transform(corpus).todense()
variables = tfidf_matrix
labels = df2.category
variables_train, variables_test, labels_train, labels_test = train_test_split(variables, labels, test_size=0.1)
svm_classifier=linear_model.SGDClassifier(loss='hinge',alpha=0.0001)
svm_classifier=svm_classifier.fit(variables_train, labels_train)
with open('my_dumped_classifier.pkl', 'wb') as fid:
pickle.dump(svm_classifier, fid)
将数据转储到文件后。我创建了另一个py文件来测试模型
test.py
corpus_test=df_test.title_desc #df_testis my dataframe with 2 columns title_desc and category
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix_test=vectorizer.fit_transform(corpus_test).todense()
svm_classifier=linear_model.SGDClassifier(loss='hinge',alpha=0.0001)
with open('my_dumped_classifier.pkl', 'rb') as fid:
svm_classifier = pickle.load(fid)
tfidf_matrix_test=vectorizer.transform(corpus_test).todense()
svm_predictions=svm_classifier.predict(tfidf_matrix_test)
我不确定我在test.py中给出的逻辑。在行
svm_predictions=svm_classifier.predict(tfidf_matrix_test)
它的错误'ValueError:X每个样本有249个特征;期待1050'
请提供解决方案。