如何在H2O

时间:2017-06-20 16:53:11

标签: r regression glm h2o

我的数据集如下所示:

rownum  a        b      y  x
1     |  A   |   a    |1 | a
2     |  B   |   a    |1 | a
3     |  C   |   a    |1 | a
4     |  D   |   a    |0 | b
5     |  E   |   a    |0 | a
6     |  F   |   a    |0 | b

我想创建许多基于组织身份的h2o.frames。像这样:

a:

rownum  a        b     y    x
1     |  A   |   a    |1 | a
2     |  D   |   a    |0 | a
3     |  F   |   a    |0 | a

B:

rownum  a      b       y  x
1     |  B   |   a    |1 | b
2     |  C   |   a    |1 | b
3     |  E   |   a    |0 | b

虽然我目前正在手动操作,但当我向数据集添加更多组织时,这变得很困难。

我还想将那些h2o.frames推送到h2o.glm并迭代保存模型。

"INSERT x NAME HERE" = h2o.glm(y = "y", x = 
c("a","b"), 
training_frame = ITERATE H2O FRAMES HERE, family = 'poisson')

然后保存模型

INSERT x NAME HERE <- h2o.saveModel(object= INSERT x NAME 
HERE, force=TRUE)

如果您有任何帮助或建议,我将不胜感激。我确实知道GLM中的交互术语,但是现在想要这样做。

1 个答案:

答案 0 :(得分:2)

由于您没有直接提供数据,我将您的示例从上面复制为R data.frame。

library(h2o)
h2o.init()

# Example data as an R data.frame
df <- data.frame(genes = c("A","B","C","D","E","F"),
                 samples = c("a","a","a","a","a","a"),
                 y = c(1,1,1,0,0,0),
                 tissue = c("Muscle","Brain","Brain","Muscle","Brain","Muscle"))

# Convert R data.frame to H2OFrame
hf <- as.h2o(df)

但是,我假设您在计算机上以CSV格式存储此数据,因此实际上,您要做的是:

# Load data from disk directly into H2O cluster
hf <- h2o.importFile("tissue_samples.csv")

现在您已将数据存储在H2OFrame中,只需执行几个步骤:

# List of unique tissue types
tissue_types <- as.list(h2o.unique(hf$tissue))

# Create list of frames (one for each tissue type)
frames <- sapply(tissue_types, function(t) hf[(hf[,"tissue"] == t),])

# Set up h2o.glm arguments
x <- c("genes", "samples")
y <- "y"

# List of glms (one for each tissue type)
glms <- sapply(frames, function(fr) h2o.glm(x = x, y = y, 
                       family = "poisson", training_frame = fr))

# Save the models
model_names <- sapply(glms, function(m) h2o.saveModel(m, path = "/Users/me/", force = TRUE))

# Look at model names
print(model_names)
# [1] "/Users/me/GLM_model_R_1497937770060_222"
# [2] "/Users/me/GLM_model_R_1497937770060_223"