Spark ml流媒体预测OnValues如何保存结果?

时间:2017-09-08 13:35:38

标签: java apache-spark spark-streaming apache-spark-ml

我有以下代码:

StreamingLinearRegressionWithSGD regressionWithSGD =
        new StreamingLinearRegressionWithSGD()
                .setInitialWeights(Vectors.zeros(featuresNumber));

JavaDStream<LabeledPoint> trainingData = streamingContext.textFileStream(model.getTrainPath()).map(LabeledPoint::parse).cache();
JavaDStream<LabeledPoint> testData = streamingContext.textFileStream(model.getPredictPath()).map(LabeledPoint::parse);
regressionWithSGD.trainOn(trainingData);
regressionWithSGD.predictOnValues(testData.mapToPair(lp -> new Tuple2<>(lp.label(), lp.features()))).print();

我想将结果放到某个文件/ db / queue等等而不是print()是否可能?

1 个答案:

答案 0 :(得分:0)

我已经弄明白了

StreamingLinearRegressionWithSGD regressionWithSGD =
                new StreamingLinearRegressionWithSGD()
                        .setInitialWeights(Vectors.zeros(featuresNumber));

        JavaDStream<LabeledPoint> trainingData = streamingContext.textFileStream(model.getTrainPath()).map(LabeledPoint::parse).cache();
        JavaDStream<LabeledPoint> testData = streamingContext.textFileStream(model.getPredictPath()).map(LabeledPoint::parse);
        regressionWithSGD.trainOn(trainingData);
        JavaDStream<Double> doubleJavaDStream=regressionWithSGD.predictOn(testData.map(labeledPoint -> labeledPoint.features()));
        doubleJavaDStream.dstream().saveAsTextFiles("result","out");

因此,我们得到了结果 - {timestamp} .out文件夹。