我的数据集如下所示:
rownum a b y x
1 | A | a |1 | a
2 | B | a |1 | a
3 | C | a |1 | a
4 | D | a |0 | b
5 | E | a |0 | a
6 | F | a |0 | b
我想创建许多基于组织身份的h2o.frames。像这样:
a:
rownum a b y x
1 | A | a |1 | a
2 | D | a |0 | a
3 | F | a |0 | a
B:
rownum a b y x
1 | B | a |1 | b
2 | C | a |1 | b
3 | E | a |0 | b
虽然我目前正在手动操作,但当我向数据集添加更多组织时,这变得很困难。
我还想将那些h2o.frames推送到h2o.glm并迭代保存模型。
"INSERT x NAME HERE" = h2o.glm(y = "y", x =
c("a","b"),
training_frame = ITERATE H2O FRAMES HERE, family = 'poisson')
然后保存模型
INSERT x NAME HERE <- h2o.saveModel(object= INSERT x NAME
HERE, force=TRUE)
如果您有任何帮助或建议,我将不胜感激。我确实知道GLM中的交互术语,但是现在想要这样做。
答案 0 :(得分:2)
由于您没有直接提供数据,我将您的示例从上面复制为R data.frame。
library(h2o)
h2o.init()
# Example data as an R data.frame
df <- data.frame(genes = c("A","B","C","D","E","F"),
samples = c("a","a","a","a","a","a"),
y = c(1,1,1,0,0,0),
tissue = c("Muscle","Brain","Brain","Muscle","Brain","Muscle"))
# Convert R data.frame to H2OFrame
hf <- as.h2o(df)
但是,我假设您在计算机上以CSV格式存储此数据,因此实际上,您要做的是:
# Load data from disk directly into H2O cluster
hf <- h2o.importFile("tissue_samples.csv")
现在您已将数据存储在H2OFrame中,只需执行几个步骤:
# List of unique tissue types
tissue_types <- as.list(h2o.unique(hf$tissue))
# Create list of frames (one for each tissue type)
frames <- sapply(tissue_types, function(t) hf[(hf[,"tissue"] == t),])
# Set up h2o.glm arguments
x <- c("genes", "samples")
y <- "y"
# List of glms (one for each tissue type)
glms <- sapply(frames, function(fr) h2o.glm(x = x, y = y,
family = "poisson", training_frame = fr))
# Save the models
model_names <- sapply(glms, function(m) h2o.saveModel(m, path = "/Users/me/", force = TRUE))
# Look at model names
print(model_names)
# [1] "/Users/me/GLM_model_R_1497937770060_222"
# [2] "/Users/me/GLM_model_R_1497937770060_223"