我是h2o的新手,并试图在网格搜索中使用xgboost。我在edgenode上运行我的东西,包含40个内核和26 GB内存,以及R和h2o中版本3.20.0.2的h2o包。只是cpu作为后端。
我已经运行gbm和randomforest没有问题(一些gbm需要大约2个小时才能完成网格搜索,它们都运行良好)。但是,当我试图运行xgboost时,我总是会收到错误。
如果我运行一个没有网格搜索的简单示例,它将运行。然而,当我使用网格搜索运行xgboost时,我总是得到错误" .h2o.doSafeREST中的错误(h2oRestApiVersion = h2oRestApiVersion,urlSuffix = urlSuffix,: 意外的CURL错误:Recv失败:重置连接" 。
我在线搜索并试图找出发生了什么。我发现了两个由LeDell给出的例子,一个是有效但不是另一个。
我在R中遇到错误" .h2o.doSafeREST错误(h2oRestApiVersion = h2oRestApiVersion,urlSuffix = urlSuffix,:意外的CURL错误:Recv失败:连接已重置"代码如下 https://gist.github.com/ledell/71e0b8861d4fa35b59dde2af282815a5
library(h2o)
h2o.init()
# Load the HIGGS dataset
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
y <- "response"
x <- setdiff(names(train), y)
family <- "binomial"
#For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])
# Some XGboost/GBM hyperparameters
hyper_params <- list(ntrees = seq(10, 1000, 1),
learn_rate = seq(0.0001, 0.2, 0.0001),
max_depth = seq(1, 20, 1),
sample_rate = seq(0.5, 1.0, 0.0001),
col_sample_rate = seq(0.2, 1.0, 0.0001))
search_criteria <- list(strategy = "RandomDiscrete",
max_models = 10,
seed = 1)
# Train the grid
xgb_grid <- h2o.grid(algorithm = "xgboost",
x = x, y = y,
training_frame = train,
nfolds = 5,
seed = 1,
hyper_params = hyper_params,
search_criteria = search_criteria)
# Sort the grid by CV AUC
grid <- h2o.getGrid(grid_id = xgb_grid@grid_id, sort_by = "AUC", decreasing = TRUE)
grid_top_model <- grid@summary_table[1, "model_ids"]
另外,我的edgenode也出错了 libgomp:线程创建失败:资源暂时不可用# [thread 140207508600576也有错误]
Java运行时环境检测到致命错误: SIGSEGV(0xb)at pc = xxxxxxxxxxx [thread 140207503337216也有错误] [thread 140207504389888也有错误],pid = 40095,tid = 0x00007f849aaea700
JRE版本:Java(TM)SE运行时环境(8.0_162-b12)(版本1.8.0_162-b12) Java VM:Java HotSpot(TM)64位服务器VM(25.162-b12混合模式linux-amd64压缩oops)
有问题的框架:
C [libc.so.6 + 0x358e5]退出+ 0x35
但是当我在下面运行代码时我没有遇到任何问题(这也是LeDell在另一篇文章中给出的一个例子)
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
y <- "response"
x <- setdiff(names(train), y)
train[,y] <- as.factor(train[,y])
hyperparameters_xgboost <- list(ntrees = seq(10, 20, 10),
learn_rate = seq(0.1, 0.2, 0.1),
sample_rate = seq(0.9, 1.0, 0.1),
col_sample_rate = seq(0.5, 0.6, 0.1))
xgb <- h2o.grid("xgboost",
x = x,
y = y,
seed = 1,
training_frame = train,
max_depth = 3,
hyper_params = hyperparameters_xgboost)
因此,我不知道出了什么问题?最初我认为xgboost不起作用,然后我成功运行xgboost(没有网格)。然后我想它必须是网格搜索部分,然后我确实用后一个例子获得了成功的运行。我没有想法,想知道是否有人可能对我的错误有一些见解?
答案 0 :(得分:0)
我无法在H2O 3.20.0.2上重现此错误:
> library(h2o)
> h2o.init()
Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 18 hours 58 minutes
H2O cluster timezone: America/Los_Angeles
H2O data parsing timezone: UTC
H2O cluster version: 3.20.0.2
H2O cluster version age: 6 days
H2O cluster name: H2O_started_from_R_me_ves048
H2O cluster total nodes: 1
H2O cluster total memory: 3.28 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.5.0 (2018-04-23)
> # Load the HIGGS dataset
> train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
|=================================================================================================| 100%
> test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
|=================================================================================================| 100%
> y <- "response"
> x <- setdiff(names(train), y)
> family <- "binomial"
> #For binary classification, response should be a factor
> train[,y] <- as.factor(train[,y])
> test[,y] <- as.factor(test[,y])
> # Some XGboost/GBM hyperparameters
> hyper_params <- list(ntrees = seq(10, 1000, 1),
+ learn_rate = seq(0.0001, 0.2, 0.0001),
+ max_depth = seq(1, 20, 1),
+ sample_rate = seq(0.5, 1.0, 0.0001),
+ col_sample_rate = seq(0.2, 1.0, 0.0001))
> search_criteria <- list(strategy = "RandomDiscrete",
+ max_models = 10,
+ seed = 1)
> # Train the grid
> xgb_grid <- h2o.grid(algorithm = "xgboost",
+ x = x, y = y,
+ training_frame = train,
+ nfolds = 5,
+ seed = 1,
+ hyper_params = hyper_params,
+ search_criteria = search_criteria)
|=================================================================================================| 100%
>