Question

有人有数据框的mapPartitions函数的有效示例吗？

请注意：我不是在看RDD示例。

更新：

MasterBuilder发布的示例从理论上讲是可以的，但实际上存在一些问题。请尝试获取像Json这样的结构化数据流

val df = spark.load.json("/user/cloudera/json")
val newDF = df.mapPartitions(
  iterator => {

    val result = iterator.map(data=>{/* do some work with data */}).toList
    //return transformed data
    result.iterator
    //now convert back to df
  }

 ).toDF()

以此错误结束：

<console>:28: error: Unable to find encoder for type stored in a Dataset.  
Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  
Support for serializing other types will be added in future releases.

有没有办法使它起作用？上面的代码有什么问题？

Answer 1

 import sqlContext.implicits._

    val newDF = df.mapPartitions(
      iterator => {

        val result = iterator.map(data=>{/* do some work with data */}).toList
        //return transformed data
        result.iterator
        //now convert back to df
      }

).toDF()

Scala中的Spark Dataframe mapPartitions

1 个答案: