Question

我只是测试了h2o，特别是它的深度学习能力，因为我已经听过很棒的事情。到目前为止，我一直在使用以下代码：

     library(h2o)
library(caret)
data("iris")

# Initiate H2O --------------------
h2o.removeAll() # Clean up. Just in case H2O was already running
h2o.init(nthreads = -1, max_mem_size="22G")  # Start an H2O cluster with all threads available

# Get training and tournament data -------------------
a <- createDataPartition(iris$Species, list=FALSE)
training <- iris[a,]
test <- iris[-a,]

# Convert target to factor -------------------
target <- as.factor(iris$Species)

feature_names <- names(train)[1:(ncol(train)-1)]

train_h2o <- as.h2o(train)
test_h2o <- as.h2o(test)

prob <- test[, "id", drop = FALSE]

model_dl <- h2o.deeplearning(x = feature_names, y = "target", training_frame = train_h2o, stopping_metric = "logloss")
h2o.logloss(model_dl)

pred_dl <- predict(model_dl, newdata = tourn_h2o)
prob <- cbind(prob, as.data.frame(pred_dl$p1, col.names = "dl"))
write.table(prob[, c("id", "dl")], paste0(model_dl@model_id, ".csv"), sep = ",", row.names = FALSE, col.names = c("id", "probability"))

相关部分真的是最后一行，我得到了以下错误：

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 


ERROR MESSAGE:

Object 'DeepLearning_model_R_1494350691427_70' not found in function: predict for argument: model

有没有人遇到过这个？有什么简单的解决方案可能会让我失踪吗？提前谢谢。

编辑：使用更新的代码我收到错误：

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 


ERROR MESSAGE:

Illegal argument(s) for DeepLearning model: DeepLearning_model_R_1494428751150_1.  Details: ERRR on field: _train: Training data must have at least 2 features (incl. response).
ERRR on field: _stopping_metric: Stopping metric cannot be logloss for regression.

我认为这与读入Iris数据集的方式有关。

Answer 1

回答第一个问题：您的原始错误消息听起来像是在事情变得同步时可以获得的消息。例如。也许你有两个会话一次运行，并在一个会话中删除了模型;另一个会议不知道它的变量现在已经过时了。 H2O允许多个连接，但它们必须是合作的。（流程 - 见下一段 - 计为第二次会议。）

除非你能做一个可重复的例子，否则耸耸肩并把它放到gremlins上，然后开始一个新的会话。或者，查看Flow中的数据/模型（始终在127.0.0.1:54321上运行的Web服务器），看看是否有某些内容不再存在。

对于您的编辑问题，您的模型正在制作回归模型，但您正在尝试使用logloss，因此您认为您正在进行分类。这是因为没有将目标变量设置为一个因素。您当前的as.factor()行是错误的数据，位于错误的位置。它应该追溯到as.h2o()行：

train_h2o <- as.h2o(training)  #Typo fix
test_h2o <- as.h2o(test)

feature_names <- names(training)[1:(ncol(training)-1)]  #typo fix
y = "Species" #The column we want to predict

train_h2o[,y] <- as.factor(train_h2o[,y])
test_h2o[,y] <- as.factor(test_h2o[,y])

然后用：

制作模型

model_dl <- h2o.deeplearning(x = feature_names, y = y, training_frame = train_h2o, stopping_metric = "logloss")

获得预测：

pred_dl <- predict(model_dl, newdata = test_h2o)  #Typo fix

使用以下预测与正确答案进行比较：

cbind(test[, y], as.data.frame(pred_dl$predict))

（顺便说一句，H2O总是将Iris数据集列完全检测为数字与因子，因此不需要上面的as.factor()行;您的错误消息必须是原始数据。）< / p>

StackOverflow建议：完整地测试您可重现的示例，并复制并粘贴该确切代码，并提供代码为您提供的确切错误消息。你的代码有很多小错字。例如。地方为train，其他地方为training。 createDataPartition()没有给出;我假设a = sample(nrow(iris), 0.8*nrow(iris))。 test没有＆＃34; id＆＃34;列。

其他H2O建议：

在 h2o.removeAll()之后运行h2o.init() 。如果之前运行它会给你一个错误信息。（就个人而言，我避免使用该功能 - 这是一种错误地留在生产脚本中的东西......）

请考虑先将数据导入h2o，然后使用h2o.splitFrame()将其拆分。即避免在R中做H2O可以轻松处理的事情。

如果可以，请尽量避免将数据放入R中。首先将importFile（）放在as.h2o（）。
上

超越最后几点的想法是H2O将超出一台机器的记忆，而R不会。与在两个地方追踪相同的东西相比，它也不那么令人困惑。

Answer 2

我遇到了同样的问题，但很容易解决。

我的错误发生是因为我在初始化 h2o-cluster 之前读入了 h2o-object。所以我训练了一个 h2o-model，保存它，关闭集群，加载模型，然后再次初始化集群。

在读入 h2o 对象之前，您应该已经初始化了集群（h2o.init()）。

H2O：在功能中找不到的深度学习对象＆＃39;预测＆＃39;对于论证＆＃39;模型＆＃39;

2 个答案: