Question

我有一个使用h2o.randomForest（）的随机森林模型。

现在，我需要使用h2o.predict（）获得大量数据。由于某些限制，我无法立即获得所有数据。所以基本上我想在循环中对不同的数据集进行评分。因此，为了加快这个过程，我想通过在2个不同的R实例中运行相同的脚本来同时为多个数据集打分。但是当我这样做时，一个实例运行正常，但其他实例给我以下错误。有时两个实例都会出现此错误。

Error in .h2o.__checkConnectionHealth(conn) : 
 H2O connection has been severed. Cannot connect to instance at http://127.0.0.1:54321/
Failed to connect to 127.0.0.1 port 54321: Address already in use

以上错误甚至不一致，有时我得到它有时候我没有。

我正在初始化h2o并在所有R实例中预测如下。

h2oServer = h2o.init(nthreads = -1, max_mem_size = '8g')
h2.predict(model, test_data)

我怎样才能做到这一点？如何通过2个不同的R实例使用h2o云？

谢谢，

Answer 1

In genenral, the method that you are trying to utilize does not speed the process up since scoring a single dataset will occupy the CPU's, multiple calls will only create unnecessary contention.

Also, you can only boot a single H2O instance from within R, if you are trying to boot more than one instance, you can do so from the command line (java -jar h2o.jar).

在R中创建2个h2o实例

1 个答案: