我将训练(X)和测试数据(test_data_process)设置为相同的列和顺序,如下所示:
但是当我这样做
predictions = my_model.predict(test_data_process)
它出现以下错误:
ValueError:feature_names不匹配:['f0','f1','f2','f3','f4','f5','f6','f7','f8','f9',' f10”,“ f11”,“ f12”,“ f13”,“ f14”,“ f15”,“ f16”,“ f17”,“ f18”,“ f19”,“ f20”,“ f21”,“ f22” ,“ f23”,“ f24”,“ f25”,“ f26”,“ f27”,“ f28”,“ f29”,“ f30”,“ f31”,“ f32”,“ f33”,“ f34”]] [ 'MSSubClass','LotFrontage','LotArea','OverallQual','OverallCond','YearBuilt','YearRemodAdd','MasVnrArea','BsmtFinSF1','BsmtFinSF2','BsmtUnfSF','TotalBsmtSFSF' ','2ndFlrSF','LowQualFinSF','GrLivArea','BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','BedroomAbvGr','KitchenAbvGr','TotRmsAbvGrd',' “ GarageCars”,“ GarageArea”,“ WoodDeckSF”,“ OpenPorchSF”,“ EnclosedPorch”,“ 3SsnPorch”,“ ScreenPorch”,“ PoolArea”,“ MiscVal”,“ YrMoSold”] 预期的f22,f25,f0,f34,f32,f5,f20,f3,f33,f15,f24,f31,f28,f9,f8,f19,f14,f18,f17,f2,f13,f4,f27,f16,f1 ,f29,f11,f26,f10,f7,f21,f30,f23,f6,f12 训练数据没有以下字段:OpenPorchSF,BsmtFinSF1,LotFrontage,GrLivArea,YrMoSold,FullBath,TotRmsAbvGrd,GarageCars,YearRemodAdd,BedroomAbvGr,PoolArea,KitchenAbvGr,LotAreas,HalfBathFuns,MiscVal,MSSFBu, ,ScreenPorch,3SsnPorch,TotalBsmtSF,GarageYrBlt,MasVnrArea,TotalQuality,Fireplaces,WoodDeckSF,2ndFlrSF,BsmtFinSF2,BsmtHalfBath,LowQualFinSF,TotalCond,GarageArea
因此,它抱怨训练数据(X)没有这些字段,而有。
如何解决此问题?
[更新]:
我的代码:
X = data.select_dtypes(exclude=['object']).drop(columns=['Id'])
X['YrMoSold'] = X['YrSold'] * 12 + X['MoSold']
X = X.drop(columns=['YrSold', 'MoSold', 'SalePrice'])
X = X.fillna(0.0000001)
train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)
my_model = XGBRegressor(n_estimators=100, learning_rate=0.05, booster='gbtree')
my_model.fit(train_X, train_y, early_stopping_rounds=5,
eval_set=[(val_X, val_y)], verbose=False)
test_data_process = test_data.select_dtypes(exclude=['object']).drop(columns=['Id'])
test_data_process['YrMoSold'] = test_data_process['YrSold'] * 12 + test_data['MoSold']
test_data_process = test_data_process.drop(columns=['YrSold', 'MoSold'])
test_data_process = test_data_process.fillna(0.0000001)
test_data_process = test_data_process[X.columns]
predictions = my_model.predict(test_data_process)
答案 0 :(得分:3)
那是一个诚实的错误。
输入数据时,您正在使用np数组:
train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)
( X.values 是一个np.array)
未定义列名
输入预测数据集时,您正在使用数据框
您应该使用:
predictions = my_model.predict(test_data_process.values)
(添加 .values )