我具有以下具有其特征的案例类
package com.mypackage.spark.event
case class TypedEvent(id: String, timestamp: Long, `type`: String)
sealed trait Event{
def id: String
def timestamp: Long
}
case class CreationEvent(id: String, timestamp: Long) extends Event
case class DeleteEvent(id: String, timestamp: Long) extends Event
我需要使用以下模式匹配机制在Dataset类中使用transform方法将指定类型TypedEvent
的数据集转换为从Event trait
扩展的另一种数据集类型,如下所示(我正在使用Spark 2.3 .1):
import spark.implicits._
val jsonDF = spark.read.json(pathToJsonFile)
val typedEventsDS = jsonDF.select("id", "timestamp", "type").as[TypedEvent]
val eventTypes = Array("CreateEvent", "DeleteEvent" , ...)
eventTypes.foreach(eventType => {
val result = typedEventsDS.filter($"type" <=> eventType)
.transform(featurize(spark, eventType)) // line 61
/**
* ...
*/
})
def featurize (spark: SparkSession, eventType: String): Dataset[TypedEvent] => Dataset[_ <: Event] = dataset => {
import spark.implicits._
eventType match {
case "CreateEvent" => dataset.as[CreationEvent]
case "DeleteEvent" => dataset.as[DeleteEvent]
...
}
}
featurize
方法应返回可扩展事件特征的任何类型的数据集。
但这没有编译错误:
Error:(61, 12) no type parameters for method transform: (t:
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[U])org.apache.spark.sql.Dataset[U] exist so
that it can be applied to arguments
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[_ <: com.mypackage.spark.event.Event])
--- because ---
argument expression's type is not compatible with formal parameter type;
found:
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[_ <: com.mypackage.spark.event.Event]
required:
org.apache.spark.sql.Dataset[com.mypackage.spark.event.TypedEvent] =>
org.apache.spark.sql.Dataset[?U]
.transform(featurize(spark, eventType))
因此,我尝试如下向transfrom
方法本身添加类型参数:
.transform[_ <: Event](featurize(eventType))
但这导致另一个编译错误:
Error:(61, 22) unbound wildcard type .transform[_ <: Event](featurize(spark, eventType))
还尝试将特征化方法设为通用:
def featurize[T <: Event](spark: SparkSession, eventType: String): Dataset[TypedEvent] => Dataset[T] =
dataset => { /* ... same ... */}
但第一个type mismatch
子句中有case
除了从[_ <: ]
方法中删除featurize
并返回确切的类型(即CreationEvent)之外,对我没有任何帮助,我真的不知道Scala泛型是怎么回事。有什么想法吗?