我从http://h2o-release.s3.amazonaws.com/h2o/master/3888/docs-website/h2o-docs/automl.html开始运行h2o.automl()
示例。除NaN
中的leaderboard
值外,一切都很顺利。预测也很好。这是一个错误还是我做错了什么?
library(h2o)
localH2O <- h2o.init(ip = "localhost",
port = 54321,
nthreads = -1,
min_mem_size = "20g")
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv")
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")
y <- "response"
x <- setdiff(names(train), y)
train[,y] <- as.factor(train[,y])
test[,y] <- as.factor(test[,y])
aml <- h2o.automl(x = x, y = y,
training_frame = train,
leaderboard_frame = test,
max_runtime_secs = 30)
lb <- aml@leaderboard
lb
model_id auc logloss
1 StackedEnsemble_0_AutoML_20170908_094736 NaN NaN
2 StackedEnsemble_0_AutoML_20170908_094407 NaN NaN
3 GBM_grid_0_AutoML_20170908_094736_model_1 NaN NaN
4 GBM_grid_0_AutoML_20170908_094407_model_0 NaN NaN
5 GBM_grid_0_AutoML_20170908_094407_model_1 NaN NaN
6 GBM_grid_0_AutoML_20170908_094736_model_0 NaN NaN
我已经检查过,localhost:54321
上的H2O流量中存在正常值,并且我使用h2o.getFrame()
得到正常值:
h2o.getFrame("leaderboard")
model_id auc logloss
1 StackedEnsemble_0_AutoML_20170908_094736 0,787145 0,554983
2 StackedEnsemble_0_AutoML_20170908_094407 0,785154 0,556897
3 GBM_grid_0_AutoML_20170908_094736_model_1 0,778587 0,563741
4 GBM_grid_0_AutoML_20170908_094407_model_0 0,776755 0,564247
5 GBM_grid_0_AutoML_20170908_094407_model_1 0,776640 0,564436
6 GBM_grid_0_AutoML_20170908_094736_model_0 0,774611 0,566920
我正在使用h2o v.3.15.0.4018
h2o.clusterInfo()
R is connected to the H2O cluster:
H2O cluster uptime: 2 hours 8 minutes
H2O cluster version: 3.15.0.4018
H2O cluster version age: 15 hours and 47 minutes
H2O cluster name: H2O_started_from_R_maju116_ozj558
H2O cluster total nodes: 1
H2O cluster total memory: 19.03 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.4.1 (2017-06-30)
会话信息:
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=pl_PL.UTF-8 LC_NUMERIC=C
[3] LC_TIME=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8
[5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=pl_PL.UTF-8
[7] LC_PAPER=pl_PL.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.7.2 purrr_0.2.3 readr_1.1.1 tidyr_0.7.1
[5] tibble_1.3.4 ggplot2_2.2.1 tidyverse_1.1.1 h2oEnsemble_0.2.1
[9] h2o_3.15.0.4018
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 cellranger_1.1.0 compiler_3.4.1 plyr_1.8.4
[5] bindr_0.1 forcats_0.2.0 bitops_1.0-6 tools_3.4.1
[9] lubridate_1.6.0 jsonlite_1.5 nlme_3.1-131 gtable_0.2.0
[13] lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.2 psych_1.7.5
[17] parallel_3.4.1 haven_1.1.0 bindrcpp_0.2 xml2_1.1.1
[21] httr_1.3.1 stringr_1.2.0 hms_0.3 grid_3.4.1
[25] glue_1.1.1 R6_2.2.2 readxl_1.0.0 foreign_0.8-69
[29] modelr_0.1.1 reshape2_1.4.2 magrittr_1.5 scales_0.5.0
[33] rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[37] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 RCurl_1.95-4.8
[41] broom_0.4.2
答案 0 :(得分:5)
只是预感,但尝试在en_US语言环境中运行R.
如果修复了它,我想象发生的事情是aml@leaderboard
或h2o.getFrame("leaderboard")
在浮点数中的逗号上是窒息的,而这正是NaN来自的地方。即显示错误,而不是数据错误。
(如果确实解决了这个问题,那么知道如果在同一个pl_PL.UTF-8语言环境中同时运行H2O和R会发生什么情况也可能有用。)