我遇到了一个已知的问题:无法保存xgboost模型并在以后加载它以获得预测,并且据说在h2o 3.18中更改了(问题出在3.16中)。我从h2o的网站(可下载的zip)更新了包,现在没有问题的模型出现以下错误:
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, :
Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused
这仅适用于xgboost(二进制分类),因为我使用的其他模型工作正常。当然,h2o已初始化,之前的模型估计没有问题。有谁知道这可能是什么问题?
编辑:这是一个可重现的例子(基于艾琳的答案)产生错误:
library(h2o)
library(caret)
h2o.init()
# Import a sample binary outcome train set into H2O
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)
# Assigning fold column
set.seed(1)
cv_folds <- createFolds(as.data.frame(train)$response,
k = 5,
list = FALSE,
returnTrain = FALSE)
# version 1
train <- train %>%
as.data.frame() %>%
mutate(fold_assignment = cv_folds) %>%
as.h2o()
# version 2
train <- h2o.cbind(train, as.h2o(cv_folds))
names(train)[dim(train)[2]] <- c("fold_assignment")
# For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])
xgb <- h2o.xgboost(x = x,
y = y,
seed = 1,
training_frame = train,
fold_column = "fold_assignment",
keep_cross_validation_predictions = TRUE,
eta = 0.01,
max_depth = 3,
sample_rate = 0.8,
col_sample_rate = 0.6,
ntrees = 500,
reg_lambda = 0,
reg_alpha = 1000,
distribution = 'bernoulli')
创建列车data.frame的两个版本都会导致相同的错误。
答案 0 :(得分:1)
您没有说明是否使用3.18重新训练了模型。通常,H2O仅保证主要版本的H2O之间的模型兼容性。如果您没有重新训练模型,那可能就是XGBoost无法正常工作的原因。如果您使用3.18重新训练模型并且XGBoost仍然无效,那么请发布一个可重复的示例,我们将进一步检查。
修改强>
我正在添加可重现的示例(与您的代码的唯一区别,这段代码是我在这里没有使用fold_column
)。这在3.18.0.2上运行良好。如果没有可重现的例子产生错误,我无法再帮助你了。
library(h2o)
h2o.init()
# Import a sample binary outcome train set into H2O
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)
# For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])
xgb <- h2o.xgboost(x = x,
y = y,
seed = 1,
training_frame = train,
keep_cross_validation_predictions = TRUE,
eta = 0.01,
max_depth = 3,
sample_rate = 0.8,
col_sample_rate = 0.6,
ntrees = 500,
reg_lambda = 0,
reg_alpha = 1000,
distribution = 'bernoulli')