as.h2o跳过我的原始(大)R数据帧的一部分而没有错误

时间:2017-07-05 15:51:18

标签: r h2o

我正在尝试将包含大约70个数字列和一个字符列的较大data.frames转换为h2o对象。 没有错误消息,但它只转换数据帧的一小部分并跳过其余部分(即生成的h2o对象的行数远低于原始数据帧的行数) 有人有建议吗?

这是一个例子(我构建了一个类似于我在研究中试图分析的示例数据框。)

nameDF <- c(paste("O",letters, sep="_"),  paste("T",letters, sep="_"),
   paste("TR",letters, sep="_"))
DF <- matrix( data=numeric(length(nameDF)*1000000), nrow=1000000)
colnames(DF) <- nameDF
DF <- as.data.frame(DF)
DF$char <- rep("bla", 1000000)
DFh2o <- as.h2o(DF)
dim(DFh2o)
dim(DF)

2 个答案:

答案 0 :(得分:0)

我尝试了相对较新的H2O 3.10.4.8,运行了你的代码示例,两个dims的行为看起来对我来说是正确的行为。

> library(h2o)
> h2o.init()

H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /var/folders/tt/g5d7cr8d3fg84jmb5jr9dlrc0000gn/T//RtmpCXcmAu/h2o_tomk_started_from_r.out
    /var/folders/tt/g5d7cr8d3fg84jmb5jr9dlrc0000gn/T//RtmpCXcmAu/h2o_tomk_started_from_r.err

java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)

Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 605 milliseconds 
    H2O cluster version:        3.10.4.8 
    H2O cluster version age:    1 month and 15 days  
    H2O cluster name:           H2O_started_from_R_tomk_jgy651 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   3.56 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    R Version:                  R version 3.3.2 (2016-10-31) 

Note:  As started, H2O is limited to the CRAN default of 2 CPUs.
       Shut down and restart H2O as shown below to use all your CPUs.
           > h2o.shutdown()
           > h2o.init(nthreads = -1)

> nameDF <- c(paste("O",letters, sep="_"),  paste("T",letters, sep="_"),
+ paste("TR",letters, sep="_"))
> DF <- matrix( data=numeric(length(nameDF)*1000000), nrow=1000000)
> colnames(DF) <- nameDF
> DF <- as.data.frame(DF)
> DF$char <- rep("bla", 1000000)
> View(DF)
> DFh2o <- as.h2o(DF)
  |=====================================================================================================================| 100%
> dim(DFh2o)
[1] 1000000      79
> dim(DF)
[1] 1000000      79

答案 1 :(得分:0)

对于有人遇到同样问题的情况,我将简单的解决方案发布到我的问题:我运行代码的机器只保留了h2o对象保存到的磁盘上的空间。解决方案很简单,但是h2o不会在其日志文件中引发错误,因此人们不会立即看到问题....最好的问候!