为了根据他们的行为检测访客人口统计数据,我使用了SPARK MLlib的SVM算法:
<div>
Persons name: {{selectedItem.firstname}}
<br>Secound name: {{selectedItem.secondname}}
<br>Age: {{selectedItem.age}}
</div>
不幸的是JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc.sc(), "labels.txt").toJavaRDD();
JavaRDD<LabeledPoint> training = data.sample(false, 0.6, 11L);
training.cache();
JavaRDD<LabeledPoint> test = data.subtract(training);
// Run training algorithm to build the model.
int numIterations = 100;
final SVMModel model = SVMWithSGD.train(training.rdd(), numIterations);
// Clear the default threshold.
model.clearThreshold();
JavaRDD<Tuple2<Object, Object>> scoreAndLabels = test.map(new SVMTestMapper(model));
会引发final SVMModel model = SVMWithSGD.train(training.rdd(), numIterations);
:
引起:java.lang.ArrayIndexOutOfBoundsException:4857
labels.txt是一个由以下内容组成的txt文件:
ArrayIndexOutOfBoundsException
我尝试了大量的数据和算法,并且看到它给站点ID大于5000的错误。
是否有任何解决方案可以克服它,或者还有另一个库来解决这个问题?或者因为数据是矩阵太稀疏应该使用SVD?