是否有可能(以及如何实现)动态训练sklearn MultinomialNB分类器? 每当我向其中发送电子邮件时,我都希望训练(更新)我的垃圾邮件分类器。
我想要这个(不起作用):
x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
for i in range(len(x_train)):
clf.fit([x_train[i]], [y_train[i]])
preds = clf.predict(x_test)
具有与此类似的结果(可以正常运行):
x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
clf.fit(x_train, y_train)
preds = clf.predict(x_test)
答案 0 :(得分:2)
Scikit-learn支持增量学习多种算法,包括MultinomialNB。检查文档here
您需要使用方法partial_fit()
而不是fit()
,因此示例代码如下:
x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
for i in range(len(x_train)):
if i == 0:
clf.partial_fit([x_train[i]], [y_train[I]], classes=numpy.unique(y_train))
else:
clf.partial_fit([x_train[i]], [y_train[I]])
preds = clf.predict(x_test)
编辑:按照@BobWazowski的建议,将classes
参数添加到partial_fit