我需要在流数据上训练线性回归模型。我使用textFileStream
读取流数据。但问题是RegressionMetrics
接受RDD[(Double, Double)]
,而output
格式为DStream[Double,Double]
。
如何将output
转换为RDD[(Double, Double)]
以便能够使用RegressionMetrics
?
val model = new StreamingLinearRegressionWithSGD()
.setInitialWeights(Vectors.dense(0.0, 0.0))
.setStepSize(0.2)
.setNumIterations(25)
trainingData = ssc.textFileStream("/training/data/dir").map(LabeledPoint.parse)
testData = ssc.textFileStream("/training/data/dir").map(LabeledPoint.parse)
model.trainOn(trainingData)
val output = model.predictOnValues(testData.map(lp => (lp.label, lp.features)))
val metrics = new RegressionMetrics(output)
val rmse = metrics.rootMeanSquaredError
答案 0 :(得分:0)
每个DStream都包含一个底层RDD(每个数据批处理一个),可以使用foreachRDD
方法访问:
model.predictOnValues(testData.map(lp => (lp.label, lp.features))).foreachRDD { rdd =>
val metrics = new RegressionMetrics(rdd)
val rmse = metrics.rootMeanSquaredError
// do something with `rmse` here
}