“无法找到存储在数据集中的类型的编码器”和“没有足够的方法映射参数”?

时间:2017-07-17 16:46:52

标签: scala apache-spark

以下代码在最后map(...)上获得了两个错误。 map()函数中缺少什么参数?如何解决“编码器”的错误?

错误:

Error:(60, 11) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
      .map(r => Cols(r.getInt(0), r.getString(1), r.getString(2), r.getString(3), r.getDouble(4), r.getDate(5), r.getString(6), r.getString(7), r.getDouble(8), r.getString(9)))

Error:(60, 11) not enough arguments for method map: (implicit evidence$6: org.apache.spark.sql.Encoder[Cols])org.apache.spark.sql.Dataset[Cols].
Unspecified value parameter evidence$6.
      .map(r => Cols(r.getInt(0), r.getString(1), r.getString(2), r.getString(3), r.getDouble(4), r.getDate(5), r.getString(6), r.getString(7), r.getDouble(8), r.getString(9)))

代码:

  case class Cols (A: Int,
                   B: String,
                   C: String,
                   D: String,
                   E: Double,
                   F: Date,
                   G: String,
                   H: String,
                   I: Double,
                   J: String
                  )

class SqlData(sqlContext: org.apache.spark.sql.SQLContext, jdbcSqlConn: String) {
  def getAll(source: String) = {
    sqlContext.read.format("jdbc").options(Map(
      "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
      "url" -> jdbcSqlConn,
      "dbtable" -> s"MyFunction('$source')"
    )).load()
      .select("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
      // The following line(60) got the errors.
      .map((r) => Cols(r.getInt(0), r.getString(1), r.getString(2), r.getString(3), r.getDouble(4), r.getDate(5), r.getString(6), r.getString(7), r.getDouble(8), r.getString(9)))
  }
}

更新

我有以下功能

def compare(sqlContext: org.apache.spark.sql.SQLContext, dbo: Dataset[Cols], ods: Dataset[Cols]) = {
    import sqlContext.implicits._
    dbo.map((r) => ods.map((s) => { // Errors occur here
      0
    }))

它也有同样的错误。

  1. 导入sqlContext.implicits._后,为什么还有错误?
  2. 我只是为了导入而创建一个新参数sqlContext。有更好的方法吗?

1 个答案:

答案 0 :(得分:3)

将所有评论合并到一个答案中:

def getAll(source: String): Dataset[Cols] = {
  import sqlContext.implicits._ // this imports the necessary implicit Encoders

  sqlContext.read.format("jdbc").options(Map(
    "driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
    "url" -> jdbcSqlConn,
    "dbtable" -> s"MyFunction('$source')"
  )).load().as[Cols] // shorter way to convert into Cols, thanks @T.Gaweda
}