无法将预测映射到JavaRDD

时间:2015-03-23 23:42:42

标签: java apache-spark rdd apache-spark-mllib

我试图将预测映射到LinearRegression模型,以便将它们传递到BinaryClassificationMetrics项目中:

// Make predictions on test documents. cvModel uses the best model found (lrModel).
DataFrame predictions = cvModel.transform(testingFrame);
JavaRDD<Tuple2<Object, Object>> scoreAndLabels = predictions.map(
        new Function<Row, Tuple2<Object, Object>>() {
            @Override
            public Tuple2<Object, Object> call(Row r) {
                Double score = r.getDouble(1);
                return new Tuple2<Object, Object>(score, r.getDouble(0));
            }
        }
);
BinaryClassificationMetrics metrics
        = new BinaryClassificationMetrics(JavaRDD.toRDD(scoreAndLabels));

但是,当我调用predictions.map(...)时,我收到以下编译错误:

method map in class DataFrame cannot be applied to given types;
  required: Function1<Row,R>,ClassTag<R>
  found: <anonymous Function<Row,Tuple2<Object,Object>>>
  reason: cannot infer type-variable(s) R
    (actual and formal argument lists differ in length)
  where R is a type-variable:
    R extends Object declared in method <R>map(Function1<Row,R>,ClassTag<R>)

有关如何映射预测DataFrame数据的任何建议?

1 个答案:

答案 0 :(得分:1)

想出来!我不得不将DataFrame转换为JavaRDD,并从那里直接进行:

DataFrame predictions = cvModel.transform(testingFrame);
JavaRDD<Tuple2<Object, Object>> scoreAndLabels = predictions.toJavaRDD().map(
        new Function<Row, Tuple2<Object, Object>>() {
            @Override
            public Tuple2<Object, Object> call(Row r) {
                Double score = r.getDouble(4);
                Double label = r.getDouble(1);
                return new Tuple2<Object, Object>(score, label);
            }
        });

BinaryClassificationMetrics metrics
        = new BinaryClassificationMetrics(JavaRDD.toRDD(scoreAndLabels));