我是R中h2o实现的新手。我有这样的数据框(df1):
df<-structure(list(v1 = c(5.24823, 0.839, 3.57348, 1.47869, 2.75093,
1.69665, 0.46366, 1.53827, 2.0149, 2.32103, 1.87223, 2.3392,
2.10579, 1.7236, 1.13056, 1.09144, 3.52515, 1.16248, 1.77885,
0.9991, 0.47375, 2.91148, 1.237, 1.18971, 1.23953, 1.07049, 1.46971,
1.65649, 3.3021, 1.04816), v100 = c(19.60784, 9.27047, 0.5523,
15.05735, 0.93231, 11.73979, 19.53795, 6.22754, 4.54464, 17.0922,
3.60958, 18.23052, 0.06395, 17.17605, 5.52724, 17.85276, 15.57143,
0.05825, 19.85401, 14.51163, 6.64372, 19.60284, 16.40279, 16.89205,
19.6748, 14.64446, 19.34747, 9.04215, 11.37993, 16.81159), v101 = c(10.71683,
7.13707, 3.61956, 9.75558, 4.21413, 8.49785, 6.79572, 5.19486,
7.39523, 6.05496, 2.91676, 9.82552, 5.5107, 5.40719, 10.82138,
12.37154, 5.56351, 3.8549, 9.87455, 5.37746, 3.57747, 8.11406,
6.61883, 7.3667, 7.74248, 12.44785, 12.38174, 5.99648, 7.10452,
8.27756)), .Names = c("v1", "v100", "v101"), row.names = c(85671L,
92268L, 44249L, 68218L, 3250L, 105583L, 4874L, 94393L, 83502L,
61414L, 42987L, 50200L, 80887L, 9321L, 39565L, 79644L, 26265L,
75272L, 104819L, 72782L, 57101L, 59037L, 78810L, 88619L, 21564L,
39198L, 55030L, 44193L, 6116L, 101448L), class = "data.frame")
我想用h2o包制作glm。所以我有以下代码:
library(h2o)
library(h2oEnsemble)
modellm<-h2o.glm(y="v1", x="v100",training_frame=df ,family="gaussian",
nfolds = 0, alpha = 0.1, lambda_search = FALSE)
但是,执行代码后出现以下错误:
Error in value[[3L]](cond) :
argument "training_frame" must be a valid H2OFrame or ID
我尝试了以下主题:
然而,它并没有解决我的问题。在上述链接执行建议的解决方案后,我得到以下内容:
> library(devtools)
> install_github("h2oai/h2o-3/h2o-r/ensemble/h2oEnsemble-package")
Downloading github repo h2oai/h2o-3@master
Installing h2oEnsemble
"C:/PROGRA~1/R/R-32~1.4R~/bin/x64/R" --no-site-file --no-environ \
--no-save --no-restore CMD INSTALL \
"C:/Users/ozgur/AppData/Local/Temp/RtmpAfGU5K/devtools8f064866e23/h2oai-h2o-3-30ef929/h2o-r/ensemble/h2oEnsemble-package" \
--library="C:/Users/ozgur/Documents/R/win-library/3.2" \
--install-tests
* installing *source* package 'h2oEnsemble' ...
** R
** tests
** preparing package for lazy loading
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
*** arch - x64
Warning: package 'h2o' was built under R version 3.2.5
Warning: package 'statmod' was built under R version 3.2.5
* DONE (h2oEnsemble)
Reloading installed h2oEnsemble
h2oEnsemble (beta) for H2O >=3.0
Version: 0.1.8
Package created on 2016-03-29
我会很高兴得到任何帮助。非常感谢。
答案 0 :(得分:2)
如果您只是想训练H2O GLM,那么您不需要h2oEnsemble包,因此您可以从代码中删除library(h2oEnsemble)
。在library(h2o)
之后,您还必须在代码h2o.init(nthreads = -1)
中添加以下行,这将在后台启动H2O群集 - “H2O群集”是优化的Java代码在其中执行的位置平行。
您遇到的问题与training_frame
有关。在H2O中,training_frame
参数必须是“H2OFrame”,而不是典型的R data.frame。出于可伸缩性的原因,H2O使用名为“H2OFrames”的分布式数据帧而不是标准的in-R-memory data.frame对象。
要将df
转换为H2OFrame并训练GLM,请执行以下操作:
hdf <- as.h2o(df) #convert data.frame to H2OFrame
modellm <- h2o.glm(y = "v1", x = "v100",training_frame = hdf, family = "gaussian",
nfolds = 0, alpha = 0.1, lambda_search = FALSE)
或者,例如,如果您将数据放在CSV文件中,则可以使用h2o.importFile()
函数直接将数据导入H2O群集,然后您无需将其转换为H2O群集。 R data.frame到H2OFrame。
由于您不熟悉H2O,我建议您查看我创建的Jupyter R notebook来教人们如何使用H2O。欢迎来到H2O!