使用Dictvectorizer词汇表保存SGD分类器

时间:2014-04-02 07:11:48

标签: python machine-learning nlp scikit-learn sentiment-analysis

我正在尝试保存训练有素的SGD分类器。我正在使用Divtvectorizer。但是当我使用它进行预测时加载酸洗分类器后我得到了跟踪错误

AttributeError:'DictVectorizer'对象没有属性'词汇_'

如何解决这个问题。我们可以保存dictvectorizer的词汇吗?

由于

以下是代码:

vecto= DictVectorizer(sparse=False)
transformer=vecto
X_train=transformer.fit_transform(features(sents))
X_test=transformer.transform(features(test))
y_test=[-1,1]
clf=SGDClassifier(alpha=0.2,loss='hinge',n_jobs=5)
clf=clf.partial_fit(X_train[:2],labels[:2],classes=[-1,1])
clf.partial_fit(X_train[2:3],labels[2:3],classes=[-1,1])
print clf.predict(X_test)
print clf.score(X_test,y_test)

1 个答案:

答案 0 :(得分:0)

您可以使用pickle库来保存和加载SGDClassifer以及DictVectorizer,如下所示:

import pickle

# save SGDClassifier
with open('model.pkl','wb') as f:
    pickle.dump(clf,f)

# load SGDClassifier
with open('model.pkl', 'rb') as f:
    clf2 = pickle.load(f)

与您的DictVectorizer类似:

# save DictVectorizer
with open('transformer.pkl','wb') as f:
    pickle.dump(transformer,f)
 
# load DictVectorizer
with open('transformer.pkl', 'rb') as f:
    transformer2 = pickle.load(f)