Question

我有一个要求，我在Spark RDD的rdd.foreachPartition内执行操作。现在我想保存在foreachPartition循环内生成的新数据。但我相信保存选项仅适用于RDD（或Dataframe）。有没有办法可以保存在foreachPartition循环内生成的新数据。我的代码如下：

lines.foreachRDD{
rdd =>

val newRDD => rdd.map(...)

newRDD.foreachParition(iter =>
val newValues = iter.map(...)

//I want to save newValues 
)

}

由于

Answer 1

只需使用mapPartitions并稍后保存：

newRDD.mapParitions(iter =>
  iter.map(...)
).saveAsTextFile(...)

保存spark rdd.foreachPatition中修改的数据

1 个答案: