我正在使用xgboost来预测airbnb的目的地(类似于Kaggle竞赛,但用于班级项目)。但是,在运行预测命令时,我会收到以下错误消息:
predict.xgb.Booster(bst,dval)中的错误:
存储在object
和newdata
中的功能名称是不同的!
如何解决此问题?
这是我的代码:
setwd("~/Documents/Big Data/Datasets-20180304")
airbnb <- read.csv("airbnb_train.csv", header = T, stringsAsFactors = F)
airbnb_test <- read.csv("airbnb_test.csv", header = T, stringsAsFactors = F)
airbnb <- na.omit(airbnb)
airbnb_test <- na.omit(airbnb_test)
airbnb$country_destination <- as.factor(airbnb$country_destination)
airbnb$country_destination[airbnb$country_destination==0] <- NA
airbnb$country_destination <- recode(airbnb$country_destination, "c('1') = '0'; c('2') = '1'")
airbnb <- na.omit(airbnb)
airbnb_test <- na.omit(airbnb_test)
set.seed(1234)
train_index <- sample(1:nrow(airbnb),size = 0.7*nrow(airbnb),replace = F)
train <- airbnb[train_index,]
validation <- airbnb[-train_index,]
options(na.action='na.pass')
new_tr = sparse.model.matrix(country_destination~.-1,data = train, with = F)
train_label <- train$country_destination
train_label <- as.numeric(train_label)-1
dtrain <- xgb.DMatrix(data = new_tr, label=train_label)
new_val = sparse.model.matrix(country_destination~.-1,data = validation, with = F)
val_label <- validation$country_destination
val_label <- as.numeric(val_label)-1
dval <- xgb.DMatrix(data = new_val, label=val_label)
#default parameters
params <- list(
booster = "gbtree",
objective = "binary:logistic",
eta=0.3,
gamma=0,
max_depth=6,
min_child_weight=1,
subsample=1,
colsample_bytree=1
)
bst <- xgboost(data = dtrain, label = train_label, max_depth = 2, eta = 1, nthread = 2, nrounds = 8, objective = "binary:logistic")
xgbpred <- predict(bst,dval)
我在做什么错?如何确保'bst'和'dval'具有相同的feature_names?
答案 0 :(得分:0)
您可以分享您的names(bst)
和names(dval)
吗?
应用提升模型后:
bst <- xgboost(data = dtrain, label = train_label, max_depth = 2, eta = 1, nthread = 2, nrounds = 8, objective = "binary:logistic")
作为解决方法,您可以简单地执行以下操作:
names(bst) <- names(dval)
然后尝试您的预测:
xgbpred <- predict(bst,dval)
答案 1 :(得分:0)
我陷入了类似的问题,这对我有用。
尝试从'dtrain'和'dtest'中删除预测变量(即您的情况下的train $ country_destination)(即使其中填充了空白值)。进行更改后,请尝试再次训练模型。
答案 2 :(得分:0)
如果您查看此页面(https://rdrr.io/cran/xgboost/src/R/xgb.Booster.R),您会发现某些R用户可能会收到以下错误消息:“存储在object
和newdata
中的功能名称是不同!”。
以下是此页面中与错误消息相关的代码:
predict.xgb.Booster <- function(object, newdata, missing = NA, outputmargin = FALSE, ntreelimit = NULL,predleaf = FALSE, predcontrib = FALSE, approxcontrib = FALSE, predinteraction = FALSE,reshape = FALSE, ...)
object <- xgb.Booster.complete(object, saveraw = FALSE)
if (!inherits(newdata, "xgb.DMatrix"))
newdata <- xgb.DMatrix(newdata, missing = missing)
if (!is.null(object[["feature_names"]]) &&
!is.null(colnames(newdata)) &&
!identical(object[["feature_names"]], colnames(newdata)))
stop("Feature names stored in `object` and `newdata` are different!")
identical(object[["feature_names"]], colnames(newdata))
=>如果object
的列名(即基于训练集的模型)与newdata
的列名(即测试集)不同,您将收到错误消息。
有关更多详细信息:
train_matrix <- xgb.DMatrix(as.matrix(training %>% select(-target)), label = training$target, missing = NaN)
object <- xgb.train(data=train_matrix, params=..., nthread=2, nrounds=..., prediction = T)
newdata <- xgb.DMatrix(as.matrix(test %>% select(-target)), missing = NaN)
借助上面的代码,您自己object
和newdata
设置数据时,可以通过查看object[["feature_names"]]
和colnames(newdata)
之间的差异来解决此问题。可能有些列的排列顺序或顺序不同。
答案 3 :(得分:0)
使用guiotan答案
library(dplyr)
您应该能够写:
xgbpred <- predict(bst, dval %>% select(bst$feature_names))
如果您使用caret
训练了xgboost,一种解决方案是编写以下内容。
xgbpred <- predict(bst, dval %>% select(bst$finalModel$feature_names))
至少这对我有用。