我正在尝试使用XGBoost创建我的第一个模型,但我不知道如何实际获得预测值。我能够训练模型并获得均方根误差值,但我不知道从这里去哪里。
我的数据集是关于房价的。我正在使用变量,例如:LotFrontage,LotArea,BldgType,TotalQuality,TotalCond,FullBath,HalfBath,TotRmsAbvGrd,YearBuilt,TotalBsmtSF,BedroomAbvGr和GrLivArea。这些变量中有些是数字变量,有些是字符串。
这是我的代码,出现错误:
library(data.table)
library(caret)
library(Metrics)
library(xgboost)
train<-fread("train_data.csv")
test<-fread("test_data.csv")
sub_train<-train[,.(LotFrontage,LotArea,BldgType,OverallQual,OverallCond,FullBath,HalfBath,TotRmsAbvGrd,YearBuilt,TotalBsmtSF,BedroomAbvGr,GrLivArea,SalePrice)]
sub_test<-test[,.(LotFrontage,LotArea,BldgType,OverallQual,OverallCond,FullBath,HalfBath,TotRmsAbvGrd,YearBuilt,TotalBsmtSF,BedroomAbvGr,GrLivArea)]
sub_test$SalePrice<-0
y.train<-sub_train$SalePrice
y.test<-sub_test$SalePrice
dummies <- dummyVars(SalePrice~ ., data = sub_train)
x.train<-predict(dummies, newdata = sub_train)
x.test<-predict(dummies, newdata = sub_test)
dtrain <- xgb.DMatrix(x.train,label=y.train,missing=NA)
dtest <- xgb.DMatrix(x.test,label=y.test,missing=NA)
param <- list( objective = "reg:linear",
gamma =0.02,
booster = "gbtree",
eval_metric = "rmse",
eta = 0.02,
max_depth = 10,
subsample = 0.9,
colsample_bytree = 0.9,
tree_method = 'hist'
)
XGBm<-xgb.cv( params=param,nfold=5,nrounds=2000,missing=NA,data=dtrain,print_every_n=1)
pred<-predict(XGBm, sub_test$SalePrice)
watchlist <- list(eval = dtest, train = dtrain)
XGBm<-xgb.train( params=param,nrounds=200,missing=NA,data=dtrain,watchlist,early_stop_round=20,print_every_n=1)
sub_train2 <- xgb.DMatrix(x.train,label=y.train,missing=NA)
pred1<-predict(XGBm, sub_train$SalePrice)
因此,我想获取一个包含预期房价的csv文件。我要更新火车数据集或诸如sub_train$SalePrice<-predict(XGBoost,sub_train$SalePrice)
之类的sub_train数据集内的SalePrice列。有任何想法吗?
此外,我已经运行了“预测”行,但是它只为我提供了.823和.174等小数,这不是我想要的。我希望房价超过100,000。
谢谢!