如何在大型数据集的h2o GLM分类器中解决NonSPDMatrixException?

时间:2019-02-18 11:28:52

标签: machine-learning h2o glm

我正在尝试使用H2O估算器运行不同的分类器。但是,在运行GLM分类器时,出现错误消息。在下面粘贴相关代码。

CLASSIFIERS = {
'RandomForest': H2ORandomForestEstimator(ntrees=200, keep_cross_validation_predictions=True, stopping_rounds=2, score_each_iteration=True, model_id="rf_cv_all_folds_"+CLASSIFIER_DATE_STR, seed=1000000),
'RandomForest_depth6': H2ORandomForestEstimator(ntrees=200, max_depth=6,keep_cross_validation_predictions=True, stopping_rounds=2, score_each_iteration=True, model_id="rf_cv_all_folds_"+CLASSIFIER_DATE_STR, seed=1000000),
'GLM': H2OGeneralizedLinearEstimator(family= "binomial", lambda_ = 0, compute_p_values = True, remove_collinear_columns=True, keep_cross_validation_predictions=True, model_id="glm_cv_all_folds_"+CLASSIFIER_DATE_STR, seed=1000000), # todo: regularization? http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html#regularization-parameters-in-glm
'GLM': H2OGeneralizedLinearEstimator(family= "binomial", lambda_ = 0, compute_p_values = True, remove_collinear_columns=True, keep_cross_validation_predictions=True, model_id="glm_cv_all_folds_"+CLASSIFIER_DATE_STR, seed=1000000, max_iterations=10000000), # todo: regularization? http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html#regularization-parameters-in-glm
'GBM': H2OGradientBoostingEstimator(ntrees=200, learn_rate=0.2, max_depth=20, stopping_tolerance=0.01, stopping_rounds=2, score_each_iteration=True, keep_cross_validation_predictions=True, model_id="gbm_cv_all_folds_"+CLASSIFIER_DATE_STR, seed=1000000),
'NaiveBayes': H2ONaiveBayesEstimator(keep_cross_validation_predictions=True,model_id="naive_bayes_cv_all_folds_"+CLASSIFIER_DATE_STR, seed=1000000)
} 

样本数据集上运行代码时,我开始为GLM分类器获取以下“用户警告”。

C:\Program Files\Anaconda2\lib\site-packages\h2o\job.py:69: UserWarning: Reached maximum number of iterations 50!
  warnings.warn(w)

但是,当我尝试在整个数据集(3gb)上运行代码时,我收到以下针对GLM分类器的错误消息。

   Job with key $03017f00000132d4ffffffff$_94bc493aa6606867c224fe00dac44410 failed with an exception: hex.gram.Gram$NonSPDMatrixException
    stacktrace: 
    hex.gram.Gram$NonSPDMatrixException
            at hex.gram.Gram$Cholesky.solve(Gram.java:664)
            at hex.gram.Gram$Cholesky$1.compute(Gram.java:607)
            at jsr166y.RecursiveAction.exec(RecursiveAction.java:160)
            at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
            at jsr166y.ForkJoinTask.doInvoke(ForkJoinTask.java:360)
            at jsr166y.ForkJoinTask.invokeAll(ForkJoinTask.java:741)
            at hex.gram.Gram$Cholesky.solve(Gram.java:611)
            at hex.gram.Gram$Cholesky.getInv(Gram.java:617)
            at hex.glm.GLM$GLMDriver.fitModel(GLM.java:1119)
            at hex.glm.GLM$GLMDriver.computeSubmodel(GLM.java:1169)
            at hex.glm.GLM$GLMDriver.computeImpl(GLM.java:1254)
            at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:218)
            at hex.glm.GLM$GLMDriver.compute2(GLM.java:571)
            at water.H2O$H2OCountedCompleter.compute(H2O.java:1395)
            at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
            at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
            at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
            at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
            at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

即使为完整数据集增加max_iternations = 1000000,我也会遇到相同的错误。

在这方面的任何帮助都会有所帮助。谢谢。

0 个答案:

没有答案