在R中使用h2o.glm时出错

时间:2016-04-23 14:09:15

标签: r h2o

我是R中h2o实现的新手。我有这样的数据框(df1):

df<-structure(list(v1 = c(5.24823, 0.839, 3.57348, 1.47869, 2.75093, 
1.69665, 0.46366, 1.53827, 2.0149, 2.32103, 1.87223, 2.3392, 
2.10579, 1.7236, 1.13056, 1.09144, 3.52515, 1.16248, 1.77885, 
0.9991, 0.47375, 2.91148, 1.237, 1.18971, 1.23953, 1.07049, 1.46971, 
1.65649, 3.3021, 1.04816), v100 = c(19.60784, 9.27047, 0.5523, 
15.05735, 0.93231, 11.73979, 19.53795, 6.22754, 4.54464, 17.0922, 
3.60958, 18.23052, 0.06395, 17.17605, 5.52724, 17.85276, 15.57143, 
0.05825, 19.85401, 14.51163, 6.64372, 19.60284, 16.40279, 16.89205, 
19.6748, 14.64446, 19.34747, 9.04215, 11.37993, 16.81159), v101 = c(10.71683, 
7.13707, 3.61956, 9.75558, 4.21413, 8.49785, 6.79572, 5.19486, 
7.39523, 6.05496, 2.91676, 9.82552, 5.5107, 5.40719, 10.82138, 
12.37154, 5.56351, 3.8549, 9.87455, 5.37746, 3.57747, 8.11406, 
6.61883, 7.3667, 7.74248, 12.44785, 12.38174, 5.99648, 7.10452, 
8.27756)), .Names = c("v1", "v100", "v101"), row.names = c(85671L, 
92268L, 44249L, 68218L, 3250L, 105583L, 4874L, 94393L, 83502L, 
61414L, 42987L, 50200L, 80887L, 9321L, 39565L, 79644L, 26265L, 
75272L, 104819L, 72782L, 57101L, 59037L, 78810L, 88619L, 21564L, 
39198L, 55030L, 44193L, 6116L, 101448L), class = "data.frame")

我想用h2o包制作glm。所以我有以下代码:

  library(h2o)
  library(h2oEnsemble)

  modellm<-h2o.glm(y="v1", x="v100",training_frame=df ,family="gaussian",
                   nfolds = 0, alpha = 0.1, lambda_search = FALSE)

但是,执行代码后出现以下错误:

Error in value[[3L]](cond) : 
  argument "training_frame" must be a valid H2OFrame or ID

我尝试了以下主题:

h2oensemble Error in value[[3L]](cond) : argument "training_frame" must be a valid H2O H2OFrame or id

然而,它并没有解决我的问题。在上述链接执行建议的解决方案后,我得到以下内容:

> library(devtools)
> install_github("h2oai/h2o-3/h2o-r/ensemble/h2oEnsemble-package")
Downloading github repo h2oai/h2o-3@master
Installing h2oEnsemble
"C:/PROGRA~1/R/R-32~1.4R~/bin/x64/R" --no-site-file --no-environ  \
  --no-save --no-restore CMD INSTALL  \
  "C:/Users/ozgur/AppData/Local/Temp/RtmpAfGU5K/devtools8f064866e23/h2oai-h2o-3-30ef929/h2o-r/ensemble/h2oEnsemble-package"  \
  --library="C:/Users/ozgur/Documents/R/win-library/3.2"  \
  --install-tests 

* installing *source* package 'h2oEnsemble' ...
** R
** tests
** preparing package for lazy loading
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
*** arch - x64
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
* DONE (h2oEnsemble)
Reloading installed h2oEnsemble
h2oEnsemble (beta) for H2O >=3.0
Version: 0.1.8
Package created on 2016-03-29  

我会很高兴得到任何帮助。非常感谢。

1 个答案:

答案 0 :(得分:2)

如果您只是想训练H2O GLM,那么您不需要h2oEnsemble包,因此您可以从代码中删除library(h2oEnsemble)。在library(h2o)之后,您还必须在代码h2o.init(nthreads = -1)中添加以下行,这将在后台启动H2O群集 - “H2O群集”是优化的Java代码在其中执行的位置平行。

您遇到的问题与training_frame有关。在H2O中,training_frame参数必须是“H2OFrame”,而不是典型的R data.frame。出于可伸缩性的原因,H2O使用名为“H2OFrames”的分布式数据帧而不是标准的in-R-memory data.frame对象。

要将df转换为H2OFrame并训练GLM,请执行以下操作:

hdf <- as.h2o(df)  #convert data.frame to H2OFrame
modellm <- h2o.glm(y = "v1", x = "v100",training_frame = hdf, family = "gaussian",
               nfolds = 0, alpha = 0.1, lambda_search = FALSE)

或者,例如,如果您将数据放在CSV文件中,则可以使用h2o.importFile()函数直接将数据导入H2O群集,然后您无需将其转换为H2O群集。 R data.frame到H2OFrame。

由于您不熟悉H2O,我建议您查看我创建的Jupyter R notebook来教人们如何使用H2O。欢迎来到H2O!