H2OGeneralizedLinearEstimator() - 预测误差

时间:2017-07-07 10:16:07

标签: java python glm h2o

我正在尝试使用H2OGeneralizedLinearEstimator函数预测Kaggle comp中的测试时间。该模型通常在第3行训练,指标都是合理的。然而,当我进入预测步骤时,尽管测试数据帧与列车数据帧匹配,但仍会出现错误。

之前有没有人见过这个错误?

 h2o_glm = H2OGeneralizedLinearEstimator()

 h2o_glm.train(training_frame=train_h2o,y='y')

 h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame()

 test_pred = pd.read_csv('test.csv')[['ID']]
 test_pred['y'] = h2o_glm_predictions
 test_pred.to_csv('h2o_glm_predictions.csv',index=False)

glm模型构建进度:|█████████████████████████████████████████ ██████| 100%

glm prediction progress: | (failed)

OSError Traceback (most recent call last) in () 3 h2o_glm.train(training_frame=train_h2o,y='y') 4 ----> 5 h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame() 6 7 test_pred = pd.read_csv('test.csv')[['ID']]

/Applications/anaconda/lib/python3.6/site-packages/h2o/model/model_base.py in predict(self, test_data) 130 j = H2OJob(h2o.api("POST /4/Predictions/models/%s/frames/%s" % (self.model_id, test_data.frame_id)), 131 self._model_json["algo"] + " prediction") --> 132 j.poll() 133 return h2o.get_frame(j.dest_key) 134

/Applications/anaconda/lib/python3.6/site-packages/h2o/job.py in poll(self) 71 if (isinstance(self.job, dict)) and ("stacktrace" in list(self.job)): 72 raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: " ---> 73 "\n{}".format(self.job_key, self.exception, self.job["stacktrace"])) 74 else: 75 raise EnvironmentError("Job with key %s failed with an exception: %s" % (self.job_key, self.exception))
  

OSError:带密钥的作业   $ 03017f00000132d4ffffffff $ _868312f4c32f683871930a1145c1476a失败   异常:来自/127.0.0.1:54321的DistributedException:' null',   由java.lang.ArrayIndexOutOfBoundsException引起的堆栈跟踪:   来自/127.0.0.1:54321的DistributedException:' null',由。引起   java.lang.ArrayIndexOutOfBoundsException at   water.MRTask.getResult(MRTask.java:478)at   water.MRTask.getResult(MRTask.java:486)at   water.MRTask.doAll(MRTask.java:390)at   water.MRTask.doAll(MRTask.java:396)at   hex.glm.GLMModel.predictScoreImpl(GLMModel.java:1215)at   hex.Model.score(Model.java:1077)at   water.api.ModelMetricsHandler $ 1.compute2(ModelMetricsHandler.java:351)   at water.H2O $ H2OCountedCompleter.compute(H2O.java:1349)at   jsr166y.CountedCompleter.exec(CountedCompleter.java:468)at   jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)at at   jsr166y.ForkJoinPool $ WorkQueue.runTask(ForkJoinPool.java:974)at at   jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)at at   jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)导致   by:java.lang.ArrayIndexOutOfBoundsException

1 个答案:

答案 0 :(得分:2)

总结上述评论,目前的解决方案是在test_data帧中添加一个响应列(假设数据不存在)。但是,这是一个应该修复的错误。 JIRA是here