X的形状不同于使用加载的模型进行转换时的拟合过程

时间:2018-10-08 08:49:56

标签: python pandas machine-learning scikit-learn

我尝试创建模型,创建模型,使用SelectFromModel选择特征,然后转储模型,当我以相同阈值重新加载模型时,出现Fitting错误,同时使用SelectFromModel进行转换

使用特征重要性进行特征选择

from numpy import loadtxt
from numpy import sort
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectFromModel
import pickle
# load data
print("X Shape ",X.shape," Test Shape ",encoded.shape)
X_train, X_test, y_train, y_test = train_test_split(X, encoded, test_size=0.33, random_state=7)
print("Train Shape ",X_train.shape," Test Shape ",X_test.shape)
num_trees=100
# fit model on all training data
model = RandomForestClassifier(n_estimators=num_trees, max_features='sqrt',criterion='entropy')
model.fit(X_train, y_train)
# select features using threshold
selection = SelectFromModel(model, threshold=0.014, prefit=True)
select_X_train = selection.transform(X_train)
print(select_X_train.shape)
# train model
selection_model = RandomForestClassifier(n_estimators=num_trees, max_features='sqrt',criterion='entropy')
selection_model.fit(select_X_train, y_train)
# eval model
select_X_test = selection.transform(X_test)
print(select_X_test.shape)
y_pred = selection_model.predict(select_X_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(y_test, predictions)
print("Thresh=%.3f, n=%d, Accuracy: %.2f%%" % (0.009, select_X_train.shape[1], accuracy*100.0))

pickle.dump(selection_model, open("SelectedModel.sav", 'wb'))
pickle.dump(selection_model, open("ParentModel.sav", 'wb'))
print("Train Shape ",select_X_train.shape," Test Shape ",select_X_test.shape)

输出:

X Shape  (18103, 34)  Test Shape  (18103,)
Train Shape  (12129, 34)  Test Shape  (5974, 34)
(12129, 27)
(5974, 27)
Thresh=0.009, n=27, Accuracy: 98.26%
Train Shape  (12129, 27)  Test Shape  (5974, 27)

尝试重新加载模型

import numpy as np
loaded_Parentmodel = pickle.load(open("ParentModel.sav", 'rb'))
loaded_Selectedmodel = pickle.load(open("ParentModel.sav", 'rb'))
selection = SelectFromModel(loaded_Parentmodel, threshold=0.014, prefit=True)

from sklearn.externals import joblib
X1 = validationDfs.drop(['NEWSTATUS'], axis=1).values
Y1 = validationDfs[['NEWSTATUS']].values


print("Validation Shape Prior Selection",X1.shape)
select_X1 = selection.transform(X1)

错误:

Validation Shape Prior Selection (739, 34)
ValueError: X has a different shape than during fitting.

0 个答案:

没有答案