Question

我正在尝试使用max_runtime_seconds，但要么我很难理解这应该如何工作，或者我觉得更有可能 - 存在某种错误。

我一直在测试随机森林，它似乎永远不会减少运行时间。

import h2o
h2o.init()
from h2o.estimators import H2ORandomForestEstimator

df=h2o.import_file('covtype.csv') #### https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/
for i in df.names:
    df[i]=df[i].asfactor()
df.types  ## just showing everything is categorical


train,test = df.split_frame(ratios=[0.75], seed = 2017)

response  = 'C55'
xvars  = train.drop(["C55"]).col_names


mymodel = H2ORandomForestEstimator(
nfolds = 10,
max_runtime_secs = 30,
    stopping_rounds = 5,
    ntrees = 500   
)

mymodel.train(
x = xvars,
y = response,
validation_frame = test,
training_frame = train)
## does not finish remotely close to <30 seconds
mymodel.actual_params()

请注意，似乎没有保存max run time参数并保持为0。我现在正在使用h2o的'流血'版本~3.13和python。

Answer 1

我已经确认这是Python API的一个错误（max_runtime_secs代码正在后端工作，也在R客户端工作）。我打开了一张机票here，我希望这将在下一个版本中修复。

h2o max_runtime_seconds - 似乎没有任何影响？

1 个答案: