“无法找到存储在数据集中的类型的编码器”甚至spark.implicits._是导入的?

时间:2017-07-17 17:57:33

标签: scala apache-spark

Error:(39, 12) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
    dbo.map((r) => ods.map((s) => {

Error:(39, 12) not enough arguments for method map: (implicit evidence$6: org.apache.spark.sql.Encoder[org.apache.spark.sql.Dataset[Int]])org.apache.spark.sql.Dataset[org.apache.spark.sql.Dataset[Int]].
Unspecified value parameter evidence$6.
    dbo.map((r) => ods.map((s) => {
object Main extends App {
  ....

  def compare(sqlContext: org.apache.spark.sql.SQLContext, 
            dbo: Dataset[Cols], ods: Dataset[Cols]) = {
    import sqlContext.implicits._ // Tried import dbo.sparkSession.implicits._ too
    dbo.map((r) => ods.map((s) => { // Errors occur here
      0
    }))
}

case class Cols (A: Int,
                   B: String,
                   C: String,
                   D: String,
                   E: Double,
                   F: Date,
                   G: String,
                   H: String,
                   I: Double,
                   J: String
                  )
  1. 导入sqlContext.implicits._后,为什么还有错误?
  2. 我仅为导入创建了参数sqlContext。没有传递参数,有更好的方法吗?
      

    这应该由import dbo.sparkSession.implicits._

    解决

1 个答案:

答案 0 :(得分:5)

您的代码正在尝试创建数据集[Dataset [Int]],出于多种原因这是错误的

如果要从两个数据集中交叉数据,则无法在数据集中使用数据集

无法创建Encoder [Dataset [Int]],你可以使用Encoder [Int],但另一件事没有意义

这样的事情更有意义

import org.apache.spark.sql.functions => func

dbo.joinWith(ods, func.expr("true")).map {
  case (r, s) =>
    0
}