尝试使用h2o包执行kmeans。这是关于我的h2o集群的信息:
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Starting H2O JVM and connecting: . Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 5 seconds 382 milliseconds
H2O cluster version: 3.10.5.3
H2O cluster version age: 5 days
H2O cluster name: H2O_started_from_R_rgb505
H2O cluster total nodes: 1
H2O cluster total memory: 14.22 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
R Version: R version 3.4.1 (2017-06-30)
我的数据是[32000,14]。所以,非常小。 尝试执行h2o.kmeans
时出现以下错误h2o_kmeans <- h2o.kmeans(training_frame = spmx_train.h2o,
nfolds = 10,
k = 20,
estimate_k = TRUE,
max_iterations = 10,
standardize = FALSE
)
错误:
java.lang.ArrayIndexOutOfBoundsException: 6 java.lang.ArrayIndexOutOfBoundsException: 6 at water.util.ArrayUtils.add(ArrayUtils.java:163) at hex.ModelMetricsClustering$MetricBuilderClustering.reduce(ModelMetricsClustering.java:131) at hex.ModelMetricsClustering$MetricBuilderClustering.reduce(ModelMetricsClustering.java:80) at hex.ModelBuilder.cv_mainModelScores(ModelBuilder.java:512) at hex.ModelBuilder.computeCrossValidation(ModelBuilder.java:292) at hex.ModelBuilder$1.compute2(ModelBuilder.java:207) at water.H2O$H2OCountedCompleter.compute(H2O.java:1256) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) Error: java.lang.ArrayIndexOutOfBoundsException: 6
当我将nfolds更改为5时,它运行正常。 所以,存在某种内存问题。 很难相信h2o无法在如此小的数据上处理1倍的kmeans。 有时随机运行代码。我关闭了所有其他应用程序,只运行了R.我有什么办法可以改进吗?