有人有数据框的mapPartitions函数的有效示例吗?
请注意:我不是在看RDD示例。
更新:
MasterBuilder发布的示例从理论上讲是可以的,但实际上存在一些问题。请尝试获取像Json这样的结构化数据流
val df = spark.load.json("/user/cloudera/json")
val newDF = df.mapPartitions(
iterator => {
val result = iterator.map(data=>{/* do some work with data */}).toList
//return transformed data
result.iterator
//now convert back to df
}
).toDF()
以此错误结束:
<console>:28: error: Unable to find encoder for type stored in a Dataset.
Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._
Support for serializing other types will be added in future releases.
有没有办法使它起作用? 上面的代码有什么问题?
答案 0 :(得分:-2)
import sqlContext.implicits._
val newDF = df.mapPartitions(
iterator => {
val result = iterator.map(data=>{/* do some work with data */}).toList
//return transformed data
result.iterator
//now convert back to df
}
).toDF()