sklearn-如何使用管道重新加载模型并进行预测?

时间:2019-08-13 19:05:00

标签: python pandas scikit-learn pipeline

我已经保存了训练有素的模型和测试数据集,并希望重新加载它只是为了验证我是否能获得相同的结果以供将来使用该模型(目前我没有要测试的新数据) 。我保存的csv不包含标签,它与原始训练/测试操作中的测试数据相同,效果很好。

我是这样创建模型的:

# copy split data for this model
dtc_test_X = test_X
dtc_test_y = test_y
dtc_train_X = train_X
dtc_train_y = train_y

# initialize the model
dtc = DecisionTreeClassifier(random_state = 1)
# fit the trianing data
dtc_yhat = dtc.fit(dtc_train_X, dtc_train_y).predict(dtc_test_X)
# scikit-learn's accuracy scoring
acc = accuracy_score(dtc_test_y, dtc_yhat)
# scikit-learn's Jaccard Index
jacc = jaccard_similarity_score(dtc_test_y, dtc_yhat)
# scikit-learn's classification report
class_report = classification_report(dtc_test_y, dtc_yhat)

我已经在下面保存了模型和数据:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

# setup the pipe line
pipe = make_pipeline(DecisionTreeClassifier)
# save the model
joblib.dump(pipe, 'model.pkl')
dtc_test_X.to_csv('set_to_predict.csv')

当我重新加载模型并尝试进行如下预测时:

#Loading the saved model with joblib
pipe = joblib.load('model.pkl')

# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)
pred_cols
# apply the whole pipeline to data
pred = pd.Series(pipe.predict(pr[pred_cols]))

在最后一行(预测)上,它引发了一个异常:

TypeError: predict() missing 1 required positional argument: 'X'

在寻找答案时,我只能找到类似异常的示例,但使用Y而不是X,并且答案似乎并不适用。为什么会出现此错误?

1 个答案:

答案 0 :(得分:0)

尝试用pipe.predict(pr[pred_cols])代替pipe.predict(X=pr[pred_cols]),以查看其是否起作用或是否会导致其他错误