如何在python中重用pickled对象?

时间:2015-06-09 21:01:40

标签: python-2.7 scikit-learn

我已经腌制了一些对象,以便我以后可以重复使用它们。例如,我腌制了三种不同的梯度增强回归量,我希望以后重复使用。但是,当我尝试使用transform方法进行回归时,python抱怨它需要先安装。以下是代码:

models #a list containing three regressors 

joblib.dump(models[0], 'gbm1.pkl')
joblib.dump(models[1], 'gbm2.pkl')
joblib.dump(models[2], 'gbm3.pkl')

然后我重新加载回iPython。

gbm = []

gbm1 = joblib.load('gbm1.pkl')
gbm.append(gbm1)
gbm2 = joblib.load('gbm2.pkl')
gbm.append(gbm2)
gbm3 = joblib.load('gbm3.pkl')
gbm.append(gbm3)

然后我尝试运行transform()方法来获取具有最重要特征的数据矩阵。

#get the most important features from gbm1,gbm2,gbm3 (for each target)
train_dict = {} #new training data with most important features
val_dict = {}   #new val data with most important features
for clf,star in zip(gbm,['*','**','***']):
    train_dict[star] = clf.transform(train_X_tfidf)
    val_dic[star] = clf.transform(val_X_tfidf)

但是,我收到以下错误(追溯):

NotFittedError                            Traceback (most recent call last)
<ipython-input-37-743077458c48> in <module>()
      3 val_dict = {}   #new val data with most important features
      4 for clf,star in zip(gbm,['*','**','***']):
----> 5     train_dict[star] = clf.transform(train_X_tfidf)
      6     val_dic[star] = clf.transform(val_X_tfidf)
      7 

//anaconda/lib/python2.7/site-packages/sklearn/feature_selection/from_model.pyc in transform(self, X, threshold)
     46         """
     47         check_is_fitted(self, ('coef_', 'feature_importances_'), 
---> 48                         all_or_any=any)
     49 
     50         X = check_array(X, 'csc')

//anaconda/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_is_fitted(estimator, attributes, msg, all_or_any)
    625 
    626     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):
--> 627         raise NotFittedError(msg % {'name': type(estimator).__name__})

NotFittedError: This GradientBoostingRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

我想如果我使用pickle进行序列化,我可以在加载后立即重复使用它。 我做错了什么?

感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

如果您使用交叉验证,您的模型可能确实需要拟合整个数据集,如建议的here