我正在开发一种文本分类模型,用于将邮件分类为垃圾邮件/火腿,并想对模型进行训练和测试。我正在使用scikit Learn提供的Pipleline:
from sklearn.pipeline import Pipeline
然后,我导入了pipleline模型并开始了训练过程:
text_clf = Pipeline([('tfidf', TfidfVectorizer()),
('clf', LinearSVC()),
])
text_clf.fit(X_train, y_train)
现在,我想通过显示准确性的提高或Generell的训练进度来绘制模型训练的进度。
但是我不知道如何解决这个问题。
我已经在这里查看过此帖子:Keras - Plot training, validation and test set accuracy
有一个代码可能会有所帮助:
import keras
from matplotlib import pyplot as plt
history = model1.fit(train_x, train_y,validation_split = 0.1, epochs=50, batch_size=4)
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
所以我对代码做了一些修改:
history = text_clf.fit(X_train, y_train,validation_split = 0.1, epochs=50, batch_size=4)
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
但是很遗憾,我收到以下错误消息:
ValueError Traceback (most recent call last)
<ipython-input-46-59ac1bc1f4fa> in <module>
----> 1 history = text_clf.fit(X_train, y_train,validation_split = 0.1, epochs=50, batch_size=4)
2 plt.plot(history.history['acc'])
3 plt.plot(history.history['val_acc'])
4 plt.title('model accuracy')
5 plt.ylabel('accuracy')
~\Anaconda3\envs\abc\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
348 This estimator
349 """
--> 350 Xt, fit_params = self._fit(X, y, **fit_params)
351 with _print_elapsed_time('Pipeline',
352 self._log_message(len(self.steps) - 1)):
~\Anaconda3\envs\abc\lib\site-packages\sklearn\pipeline.py in _fit(self, X, y, **fit_params)
278 "pipeline using the stepname__parameter format, e.g. "
279 "`Pipeline.fit(X, y, logisticregression__sample_weight"
--> 280 "=sample_weight)`.".format(pname))
281 step, param = pname.split('__', 1)
282 fit_params_steps[step][param] = pval
ValueError: Pipeline.fit does not accept the validation_split parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. `Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight)`.
所以现在我不知道如何解决这个问题。