我有org.apache.spark.util.CollectionAccumulator[(Double, Double)]
我在流媒体时添加了行。
现在我想将其转换为DataFrame以进行进一步处理。但我不确定如何实现它。
添加我如何填充累加器的代码片段:
val strmquery = dataFramedummy.writeStream.foreach(new ForeachWriter[Row]() {
override def open(partitionId: Long, version: Long): Boolean = true
override def process(row: Row): Unit = {
println(s">> Processing ${row}")
accumulator.add((row.getAs("Field1").asInstanceOf[Double], row.getAs("Filed2").asInstanceOf[Double]))
}
override def close(errorOrNull: Throwable): Unit = {
// do nothing
}
}).outputMode("append").start()
答案 0 :(得分:2)
将累加器转换为列表,然后再创建数据集。
val accumulator :org.apache.spark.util.CollectionAccumulator[(Double, Double)] = ???
spark.createDataset(accumulator.value)