使用以前保存的模型

时间:2017-08-10 08:18:39

标签: python machine-learning scikit-learn orange

我正在使用Orange数据挖掘工具编写python脚本,以使用以前保存的模型(pickle文件)获得测试数据的分类准确性。

dataFile = "training.csv" 
data = Orange.data.Table(dataFile);
learner = Orange.classification.RandomForestLearner()
cf = learner(data)
#save the pickle file
with open("1.pkcls", "wb") as f:
    pickle.dump(cf, f)

#load the pickle file
with open("1.pkcls", "rb") as f:
    loadCF = pickle.load(f)
testFile = "testing.csv" 
test = Orange.data.Table(testFile);

learners = [1]
learners[0] = cf
result = Orange.evaluation.testing.TestOnTestData(data,test,learners)
# get classification accuracy
CAs = Orange.evaluation.CA(result)

我可以成功保存并加载模型,但我遇到了错误

    CAs = Orange.evaluation.CA(result)


File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 39, in __new__
    return self(results, **kwargs)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 48, in __call__
    return self.compute_score(results, **kwargs)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 84, in compute_score
    return self.from_predicted(results, skl_metrics.accuracy_score)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 75, in from_predicted
    dtype=np.float64, count=len(results.predicted))
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/Orange/evaluation/scoring.py", line 74, in <genexpr>
    for predicted in results.predicted),
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "/Users/anaconda2/envs/py36/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 82, in _check_targets
    "".format(type_true, type_pred))
ValueError: Can't handle mix of multiclass and continuous

我找到了解决此问题的方法,并通过删除

成功生成分类准确性
cf = learner(data)

但是,如果删除这行代码,我无法训练模型并保存它,因为RandomForestLearner在保存和加载模型代码之前不会根据输入文件训练模型。

with open("1.pkcls", "wb") as f:
pickle.dump(cf, f)

#load the pickle file
with open("1.pkcls", "rb") as f:
loadCF = pickle.load(f)

有没有人知道是否可以先训练模型并将其保存为pickle文件。那么我可以用它来测试另一个文件以便以后获得分类准确度吗?

1 个答案:

答案 0 :(得分:1)

在将分类器传递给TestOnTestData之前,你不能预先训练它(它的名字应该是TrainOnTrainAndTestOnTestData,即它自己调用拟合/训练步骤。)

遗憾的是,没有一种明确的方法可以从测试数据集上预训练分类器的应用程序创建Result实例。

一种快速而又肮脏的方法是将传递给TestOnTest数据的“学习者”打包以返回预先训练的模型

results = Orange.evaluation.testing.TestOnTestData(data, test, [lambda testdata: loadCF])