Question

我有简单的Java代码来使用AWS EMR上的MLLib svm预测方法来预测标签。我播放模型，然后在地图中使用模型进行评分。预测只是挂在一个只有1000个观测值的大型集群上。集群细节：

r3.8xlarge类型10节点集群
driver-memory 64g
executor-memory 64g
num-executors 15
executor-cores 5
master type yarn-cluster
部署模式群集

代码如下：

final Broadcast<SVMModel> svmModel = sc.broadcast(model);

// Compute raw scores on the test set.
JavaRDD<Tuple2<Double, VectorAndLabeledPoint>> scoreAndLabels = test.map(
    new Function<VectorAndLabeledPoint, Tuple2<Double, VectorAndLabeledPoint>>() {
        public Tuple2<Double, VectorAndLabeledPoint> call(VectorAndLabeledPoint p) {

               SVMModel modelin = svmModel.value();
               System.out.println("model file loaded in map:"+modelin.toString());
               Double score = modelin.predict(p.getVector());
               return new Tuple2<Double, VectorAndLabeledPoint>(score, p);
        }
    }
);

为什么预测群集会如此慢，只需在单节点上快速挂起？单个节点上2分钟，而群集仍在运行时超过1小时。

spark cluster svm mllib预测慢和挂起

0 个答案: