更新到3.18后估计h2o中的xgboost时出错

时间:2018-02-21 13:18:08

标签: r io h2o xgboost

我遇到了一个已知的问题:无法保存xgboost模型并在以后加载它以获得预测,并且据说在h2o 3.18中更改了(问题出在3.16中)。我从h2o的网站(可下载的zip)更新了包,现在没有问题的模型出现以下错误:

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix,  : 
  Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused

这仅适用于xgboost(二进制分类),因为我使用的其他模型工作正常。当然,h2o已初始化,之前的模型估计没有问题。有谁知道这可能是什么问题?

编辑:这是一个可重现的例子(基于艾琳的答案)产生错误:

library(h2o)
library(caret)
h2o.init()

# Import a sample binary outcome train set into H2O
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")

# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)

# Assigning fold column
set.seed(1)
cv_folds <- createFolds(as.data.frame(train)$response,
                        k = 5,
                        list = FALSE,
                        returnTrain = FALSE)

# version 1
train <- train %>%
    as.data.frame() %>% 
    mutate(fold_assignment = cv_folds) %>%
    as.h2o()

# version 2
train <- h2o.cbind(train, as.h2o(cv_folds))
names(train)[dim(train)[2]] <- c("fold_assignment")


# For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])

xgb <- h2o.xgboost(x = x,
                   y = y, 
                   seed = 1,
                   training_frame = train,
                   fold_column = "fold_assignment",
                   keep_cross_validation_predictions = TRUE,
                   eta = 0.01,
                   max_depth = 3,
                   sample_rate = 0.8,
                   col_sample_rate = 0.6,
                   ntrees = 500,
                   reg_lambda = 0,
                   reg_alpha = 1000,
                   distribution = 'bernoulli') 

创建列车data.frame的两个版本都会导致相同的错误。

1 个答案:

答案 0 :(得分:1)

您没有说明是否使用3.18重新训练了模型。通常,H2O仅保证主要版本的H2O之间的模型兼容性。如果您没有重新训练模型,那可能就是XGBoost无法正常工作的原因。如果您使用3.18重新训练模型并且XGBoost仍然无效,那么请发布一个可重复的示例,我们将进一步检查。

修改 我正在添加可重现的示例(与您的代码的唯一区别,这段代码是我在这里没有使用fold_column)。这在3.18.0.2上运行良好。如果没有可重现的例子产生错误,我无法再帮助你了。

library(h2o)
h2o.init()

# Import a sample binary outcome train set into H2O
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")

# Identify predictors and response
y <- "response"
x <- setdiff(names(train), y)

# For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])

xgb <- h2o.xgboost(x = x,
                   y = y, 
                   seed = 1,
                   training_frame = train,
                   keep_cross_validation_predictions = TRUE,
                   eta = 0.01,
                   max_depth = 3,
                   sample_rate = 0.8,
                   col_sample_rate = 0.6,
                   ntrees = 500,
                   reg_lambda = 0,
                   reg_alpha = 1000,
                   distribution = 'bernoulli')