Question

为了根据他们的行为检测访客人口统计数据，我使用了SPARK MLlib的SVM算法：

<div>
Persons name: {{selectedItem.firstname}}
<br>Secound name: {{selectedItem.secondname}}
<br>Age: {{selectedItem.age}}
</div>

不幸的是JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc.sc(), "labels.txt").toJavaRDD(); JavaRDD<LabeledPoint> training = data.sample(false, 0.6, 11L); training.cache(); JavaRDD<LabeledPoint> test = data.subtract(training); // Run training algorithm to build the model. int numIterations = 100; final SVMModel model = SVMWithSGD.train(training.rdd(), numIterations); // Clear the default threshold. model.clearThreshold(); JavaRDD<Tuple2<Object, Object>> scoreAndLabels = test.map(new SVMTestMapper(model));会引发final SVMModel model = SVMWithSGD.train(training.rdd(), numIterations);：

引起：java.lang.ArrayIndexOutOfBoundsException：4857

labels.txt是一个由以下内容组成的txt文件：

ArrayIndexOutOfBoundsException

我尝试了大量的数据和算法，并且看到它给站点ID大于5000的错误。

是否有任何解决方案可以克服它，或者还有另一个库来解决这个问题？或者因为数据是矩阵太稀疏应该使用SVD？

如何克服SVMWithSGD抛出ArrayIndexOutOfBoundsException索引大于5000？

0 个答案: