我将我的一个项目从Spark 1.6升级到Spark 2.0.1。以下代码适用于Spark 1.6,但它不适用于2.0.1:
def count(df: DataFrame): DataFrame = {
val sqlContext = df.sqlContext
import sqlContext.implicits._
df.map { case Row(userId: String, itemId: String, count: Double) =>
(userId, itemId, count)
}.toDF("userId", "itemId", "count")
}
以下是错误消息:
Error:(53, 12) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
df.map { case Row(userId: String, itemId: String, count: Double) =>
^
Error:(53, 12) not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[(String, String, Double)])org.apache.spark.sql.Dataset[(String, String, Double)].
Unspecified value parameter evidence$7.
df.map { case Row(userId: String, itemId: String, count: Double) =>
^
我尝试使用df.rdd.map
代替df.map
,然后出现以下错误:
Error:(55, 7) value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
possible cause: maybe a semicolon is missing before `value toDF'?
}.toDF("userId", "itemId", "count")
^
如何将元组的RDD转换为Spark 2.0中的数据框?
答案 0 :(得分:0)
您的代码中的其他地方很可能存在语法错误,因为您的地图函数似乎在您获得时正确编写
错误:(53,12)方法映射的参数不够:(隐式证据$ 7:org.apache.spark.sql.Encoder [(String,String,Double)])org.apache.spark.sql.Dataset [(String,String,Double)]。 未指定的值参数证据$ 7
你的代码在我的Spark shell中工作,我已经测试过了。