我使用SVM分类器构建了情绪分析器。我训练模型的概率=真,它可以给我概率。但是当我腌制我的模型然后再加载它时,概率就不再起作用了。
模特:
from sklearn.svm import SVC, LinearSVC
pipeline_svm = Pipeline([
('bow', CountVectorizer()),
('tfidf', TfidfTransformer()),
('classifier', SVC(probability=True)),])
# pipeline parameters to automatically explore and tune
param_svm = [
{'classifier__C': [1, 10, 100, 1000], 'classifier__kernel': ['linear']},
{'classifier__C': [1, 10, 100, 1000], 'classifier__gamma': [0.001, 0.0001], 'classifier__kernel': ['rbf']},
]
grid_svm = GridSearchCV(
pipeline_svm,
param_grid=param_svm,
refit=True,
n_jobs=-1,
scoring='accuracy',
cv=StratifiedKFold(label_train, n_folds=5),)
svm_detector_reloaded = cPickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict([""""Today is awesome day"""])[0])
给我:
AttributeError:当probability = False时,predict_proba不可用
答案 0 :(得分:1)
如果有帮助,请使用以下方法对模型进行酸洗:
import pickle
pickle.dump(grid_svm, open('svm_sentiment_analyzer.pkl', 'wb'))
加载模型并使用
进行预测svm_detector_reloaded = pickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict_proba(["Today is an awesome day"])[0])
在重新运行代码并在熊猫sents
DataFrame上使用
grid_svm.fit(sents.Sentence.values, sents.Positive.values)
有关模型序列化的最佳做法(例如,使用joblib
),请访问https://scikit-learn.org/stable/modules/model_persistence.html
答案 1 :(得分:1)
在初始化分类器时添加 (probability=True) 按照上面的建议解决了我的错误:
clf = SVC(kernel='rbf', C=1e9, gamma=1e-07, probability=True).fit(xtrain,ytrain)
答案 2 :(得分:0)
您可以将CallibratedClassifierCV用于概率得分输出。
from sklearn.calibration import CalibratedClassifierCV
model_svc = LinearSVC()
model = CalibratedClassifierCV(model_svc)
model.fit(X_train, y_train)
使用泡菜保存模型。
import pickle
filename = 'linearSVC.sav'
pickle.dump(model, open(filename, 'wb'))
使用pickle.load加载模型。
model = pickle.load(open(filename, 'rb'))
现在开始预测。
pred_class = model.predict(pred)
probability = model.predict_proba(pred)
答案 3 :(得分:0)
使用:SVM(probability=True)
或
grid_svm = GridSearchCV(
probability=True
pipeline_svm,
param_grid=param_svm,
refit=True,
n_jobs=-1,
scoring='accuracy',
cv=StratifiedKFold(label_train, n_folds=5),)
答案 4 :(得分:0)
使用predprobs函数根据auc(y_true, y_score )中的要求计算分数或概率/分数,这是因为y_score。 您可以按照下面的代码行所示进行转换
# Classifier - Algorithm - SVM
# fit the training dataset on the classifier
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto',probability=True)
SVM.fit(Train_X_Tfidf,Train_Y)
# predict the labels on validation dataset
predictions_SVM = SVM.predict(Test_X_Tfidf)
# Use accuracy_score function to get the accuracy
**print("SVM Accuracy Score -> ",accuracy_score(predictions_SVM, Test_Y))**
probs = SVM.**predict_proba**(Test_X_Tfidf)
preds = probs[:,1]
fpr, tpr, threshold = **roc_curve(Test_Y, preds)**
**print("SVM Area under curve -> ",auc(fpr, tpr))**
查看precision_score和auc()之间的区别,您需要预测的分数。