我正在尝试将包含大约70个数字列和一个字符列的较大data.frames转换为h2o对象。 没有错误消息,但它只转换数据帧的一小部分并跳过其余部分(即生成的h2o对象的行数远低于原始数据帧的行数) 有人有建议吗?
这是一个例子(我构建了一个类似于我在研究中试图分析的示例数据框。)
nameDF <- c(paste("O",letters, sep="_"), paste("T",letters, sep="_"),
paste("TR",letters, sep="_"))
DF <- matrix( data=numeric(length(nameDF)*1000000), nrow=1000000)
colnames(DF) <- nameDF
DF <- as.data.frame(DF)
DF$char <- rep("bla", 1000000)
DFh2o <- as.h2o(DF)
dim(DFh2o)
dim(DF)
答案 0 :(得分:0)
我尝试了相对较新的H2O 3.10.4.8,运行了你的代码示例,两个dims的行为看起来对我来说是正确的行为。
> library(h2o)
> h2o.init()
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
/var/folders/tt/g5d7cr8d3fg84jmb5jr9dlrc0000gn/T//RtmpCXcmAu/h2o_tomk_started_from_r.out
/var/folders/tt/g5d7cr8d3fg84jmb5jr9dlrc0000gn/T//RtmpCXcmAu/h2o_tomk_started_from_r.err
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)
Starting H2O JVM and connecting: .. Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 2 seconds 605 milliseconds
H2O cluster version: 3.10.4.8
H2O cluster version age: 1 month and 15 days
H2O cluster name: H2O_started_from_R_tomk_jgy651
H2O cluster total nodes: 1
H2O cluster total memory: 3.56 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 2
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
R Version: R version 3.3.2 (2016-10-31)
Note: As started, H2O is limited to the CRAN default of 2 CPUs.
Shut down and restart H2O as shown below to use all your CPUs.
> h2o.shutdown()
> h2o.init(nthreads = -1)
> nameDF <- c(paste("O",letters, sep="_"), paste("T",letters, sep="_"),
+ paste("TR",letters, sep="_"))
> DF <- matrix( data=numeric(length(nameDF)*1000000), nrow=1000000)
> colnames(DF) <- nameDF
> DF <- as.data.frame(DF)
> DF$char <- rep("bla", 1000000)
> View(DF)
> DFh2o <- as.h2o(DF)
|=====================================================================================================================| 100%
> dim(DFh2o)
[1] 1000000 79
> dim(DF)
[1] 1000000 79
答案 1 :(得分:0)
对于有人遇到同样问题的情况,我将简单的解决方案发布到我的问题:我运行代码的机器只保留了h2o对象保存到的磁盘上的空间。解决方案很简单,但是h2o不会在其日志文件中引发错误,因此人们不会立即看到问题....最好的问候!