我有2个数据集:
implicit val spark: SparkSession = SparkSession
.builder()
.appName("app").master("local[1]")
.config("spark.executor.memory", "1g")
.getOrCreate()
import spark.implicits._
val ds1 = /*read csv file*/.as[caseClass1]
val ds2 = /*read csv file*/.as[caseClass2]
然后我加入并映射如下:
val ds3 = ds1.
joinWith(ds2, ds1("id") === ds2("id"))
.map{case(left, right) => (left, Option(right))}
获得预期结果。
问题在于我正在尝试使用以及其他一些功能实现RichDataset:
object Extentions {
implicit class RichDataset[T <: Product](leftDs: Dataset[T]) {
def leftJoinWith[V <: Product](rightDs: Dataset[V], condition:
Column)(implicit spark: SparkSession) : Dataset[(T, Option[V])] = {
import spark.implicits._
leftDs.joinWith(rightDs, condition, "left")
.map{case(left, right) => (left, Option(right))}
}
}
}
在main中,使用import Extentions._对leftJoin的调用失败:
Error:(15, 13) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
.map{case(left, right) => (left, Option(right))}
Error:(15, 13) not enough arguments for method map: (implicit evidence$6: org.apache.spark.sql.Encoder[(T, Option[V])])org.apache.spark.sql.Dataset[(T, Option[V])].
Unspecified value parameter evidence$6.
.map{case(left, right) => (left, Option(right))}
...但是spark.implicits._是在函数内导入的!
如果只返回join,而不是join + map,它将在main和in function中都有效。
scalaVersion:=“2.11.8”,sparkVersion:=“2.2.0”
提前致谢!
答案 0 :(得分:2)
如果你将TypeTag
添加到泛型类型参数中,它就可以工作(在Spark的源代码中看到了这一点):
import scala.reflect.runtime.universe.TypeTag
import org.apache.spark.sql.{Column, Dataset, SparkSession}
object Extentions {
implicit class RichDataset[T <: Product : TypeTag](leftDs: Dataset[T]) {
def leftJoinWith[V <: Product : TypeTag](rightDs: Dataset[V], condition:
Column)(implicit spark: SparkSession) : Dataset[(T, Option[V])] = {
import spark.implicits._
leftDs.joinWith(rightDs, condition, "left")
.map{case(left, right) => (left, Option(right))}
}
}
}