如何将CollectionAccumulator [(Double,Double)]转换为SparkdataFrame?

时间:2018-06-13 06:13:14

标签: scala apache-spark apache-spark-sql

我有org.apache.spark.util.CollectionAccumulator[(Double, Double)] 我在流媒体时添加了行。

现在我想将其转换为DataFrame以进行进一步处理。但我不确定如何实现它。

修改

添加我如何填充累加器的代码片段:

val strmquery = dataFramedummy.writeStream.foreach(new ForeachWriter[Row]() {

  override def open(partitionId: Long, version: Long): Boolean = true

  override def process(row: Row): Unit = {
    println(s">> Processing ${row}")
    accumulator.add((row.getAs("Field1").asInstanceOf[Double], row.getAs("Filed2").asInstanceOf[Double]))
  }

  override def close(errorOrNull: Throwable): Unit = {
    // do nothing
  }
}).outputMode("append").start()

1 个答案:

答案 0 :(得分:2)

将累加器转换为列表,然后再创建数据集。

val accumulator :org.apache.spark.util.CollectionAccumulator[(Double, Double)] = ???
spark.createDataset(accumulator.value)