我已经搜索了不同的门户网站,甚至在h2o集合文档中,所有我都有集合示例,仅用于分类问题二进制本质,但没有一个示例显示如何实现一般堆叠或h2o集合以解决r中的简单回归问题
我请求任何人请分享有关如何实施h2o整体或仅为R中的回归问题堆叠的工作代码
OR
简单的集合仅适用于R中的回归。
只想知道如何为不同权重的回归实施集合/堆叠。
答案 0 :(得分:1)
以下是为R中的回归问题(预测年龄)构建堆叠集合的示例:
library('h2o')
h2o.init()
files3 = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
col_types <- c("Numeric","Numeric","Numeric","Enum","Enum","Numeric","Numeric","Numeric","Numeric")
dat <- h2o.importFile(files3,destination_frame = "prostate.hex",col.types = col_types)
ss <- h2o.splitFrame(dat, ratios = 0.8, seed = 1)
train <- ss[[1]]
test <- ss[[2]]
x <- c("CAPSULE","GLEASON","RACE","DPROS","DCAPS","PSA","VOL")
y <- "AGE"
nfolds <- 5
# Train & Cross-validate a GBM
my_gbm <- h2o.gbm(x = x,
y = y,
training_frame = train,
distribution = "gaussian",
max_depth = 3,
learn_rate = 0.2,
nfolds = nfolds,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
seed = 1)
# Train & Cross-validate a RF
my_rf <- h2o.randomForest(x = x,
y = y,
training_frame = train,
ntrees = 30,
nfolds = nfolds,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
seed = 1)
# Train & Cross-validate a extremely-randomized RF
my_xrf <- h2o.randomForest(x = x,
y = y,
training_frame = train,
ntrees = 50,
histogram_type = "Random",
nfolds = nfolds,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
seed = 1)
# Train a stacked ensemble using the models above
stack <- h2o.stackedEnsemble(x = x,
y = y,
training_frame = train,
validation_frame = test, #also test that validation_frame is working
model_id = "my_ensemble_gaussian",
base_models = list(my_gbm@model_id, my_rf@model_id, my_xrf@model_id))
# predict
pred <- h2o.predict(stack, newdata = test)
答案 1 :(得分:0)
我的书(使用H2O的实用机器学习)中的堆叠整体示例是回归(在建筑能量数据集上)。 : - )
但是,如果你认为你已经用H2O耗尽了所有文档,请尝试在github上搜索源代码。这是他们对堆叠整体回归的单元测试: