Spark 2.0:如何将元组的RDD转换为DF

时间:2017-06-01 03:12:17

标签: scala apache-spark apache-spark-sql spark-dataframe rdd

我将我的一个项目从Spark 1.6升级到Spark 2.0.1。以下代码适用于Spark 1.6,但它不适用于2.0.1:

   def count(df: DataFrame): DataFrame = {
    val sqlContext = df.sqlContext
    import sqlContext.implicits._

    df.map { case Row(userId: String, itemId: String, count: Double) =>
      (userId, itemId, count)
    }.toDF("userId", "itemId", "count")
   }

以下是错误消息:

Error:(53, 12) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
    df.map { case Row(userId: String, itemId: String, count: Double) =>
           ^
Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)].
Unspecified value parameter evidence$7.
    df.map { case Row(userId: String, itemId: String, count: Double) =>
       ^

我尝试使用df.rdd.map代替df.map,然后出现以下错误:

Error:(55, 7) value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
possible cause: maybe a semicolon is missing before `value toDF'?
    }.toDF("userId", "itemId", "count")
      ^

如何将元组的RDD转换为Spark 2.0中的数据框?

1 个答案:

答案 0 :(得分:0)

您的代码中的其他地方很可能存在语法错误,因为您的地图函数似乎在您获得时正确编写

  

错误:(53,12)方法映射的参数不够:(隐式证据$ 7:org.apache.spark.sql.Encoder [(String,String,Double)])org.apache.spark.sql.Dataset [(String,String,Double)]。   未指定的值参数证据$ 7

你的代码在我的Spark shell中工作,我已经测试过了。