我有一个包含50行(主题)和572288列(变量)的数据框
将data.frame
解析为h2o对象时,我丢失变量并最终得到
51行和419431个变量。
减少行数或增加行数都不会改变。
library("data.table")
library("h2o")
options("h2o.use.data.table"=T)
h2o.init()
trainset=as.data.frame(matrix(ncol=572288,nrow=50,1))
fwrite(trainset, "train.csv", sep=",")
train=h2o.importFile("train.csv", sep=",")
dim(trainset)
dim(train)
我的输出是:
> h2o.init()
Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 1 hours 2 minutes
H2O cluster timezone: Europe/Berlin
H2O data parsing timezone: UTC
H2O cluster version: 3.18.0.11
H2O cluster version age: 3 months
H2O cluster name: H2O_started_from_R_chiocchetti_lub856
H2O cluster total nodes: 1
H2O cluster total memory: 9.84 GB
H2O cluster total cores: 24
H2O cluster allowed cores: 20
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.4.3 (2017-11-30)
> trainset=as.data.frame(matrix(ncol=572288,nrow=50,1))
> fwrite(trainset, "train.csv", sep=",")
>
> train=h2o.importFile("train.csv", sep=",")
|======================================================================|100%
> dim(train)
[1] 51 538177
> dim(trainset)
[1] 50 572288
在我看来,从文件中读取行时,我遇到某种内存问题。但是,我不知道如何解决这个问题。
最终目的是做一个randomForest。
答案 0 :(得分:2)
这可能是一个错误;我在这里为此{@ 3}创建了一张吉拉票。
如果您有吉拉帐户,请随时更新票证。