Question

这可能是一个愚蠢的问题，但是我已经挣扎了很长时间。它确实类似于this question，但是我无法将其应用到我的代码中（由于模式或函数）。

我想将flatMap（或地图）转换函数传递给函数参数，然后将其代理给实际上调用df.rdd.flatMap方法的策略函数。我会尽力解释！

case class Order(id: String, totalValue: Double, freight: Double) 
case class Product(id: String, price: Double) 

... or any other case class, whatever one needs to transform a row into ...

实体类：

class Entity(path: String) = {
  ...
  def flatMap[T](mapFunction: (Row) => ArrayBuffer[T]): Entity = {
      this.getStrategy.flatMap[T](mapFunction)
      return this
  }
  def save(path: String): Unit = {
      ... write logic ...
  } 
}

实体的方法可能有不同的策略。 EntityStrategy如下：

abstract class EntityStrategy(private val entity: Entity,
                              private val spark: SparkSession) {
  ...
  def flatMap[T](mapFunction: (Row) => ArrayBuffer[T])
  def map[T](mapFunction: (Row) => T)
}

和一个示例EntityStrategy实现：

class SparkEntityStrategy(private val entity: Entity, private val spark: SparkSession)
  extends EntityStrategy(entity, spark) {
  ...
  override def map[T](mapFunction: Row => T): Unit = {
    val rdd = this.getData.rdd.map(f = mapFunction)
    this.dataFrame = this.spark.createDataFrame(rdd)
  }

  override def flatMap[T](mapFunction: (Row) => ArrayBuffer[T]): Unit = {
    var rdd = this.getData.rdd.flatMap(f = mapFunction)
    this.dataFrame = this.spark.createDataFrame(rdd)
  }
}

最后，我想创建一个flatMap / map函数并像这样调用它：

def transformFlatMap(row: Row): ArrayBuffer[Order] = {
    var orders = new ArrayBuffer[Order]
    var _deliveries = row.getAs[Seq[Row]]("deliveries")
    _deliveries.foreach(_delivery => {
       var order = Order(
           id = row.getAs[String]("id"),
           totalValue = _delivery.getAs("totalAmount").asInstanceOf[Double])
      orders += order
    })
   return orders
}

val entity = new Entity("path")
entity.flatMap[Order](transformFlatMap).save("path")

当然，这不起作用。我在SparkEntityStrategy上收到错误消息：

错误：（95，35）T没有可用的ClassTag val rdd = this.getData.rdd.map（f = mapFunction）

我尝试将(implicit encoder: Encoder: T)添加到实体方法和策略方法中，但这是不行的。当我刚接触Scala时，可能做错了事。

如果我删除“ T”并通过实际的案例类，那么一切都会很好。

Answer 1

为了满足编译器和Spark方法的需要，我需要添加以下类型标记：

[{T <: scala.Product : ClassTag : TypeTag]

所以这两种方法都变成了：

def map[T <: Product : ClassTag : TypeTag](mapFunction: (Row) => T): Entity
def flatMap[T <: scala.Product : ClassTag : TypeTag](mapFunction: (Row) => TraversableOnce[T]): Entity

关于scala.Product：

所有产品的基本特征，标准库中包括至少是scala.Product1到scala.Product22，因此也是如此通过scala.Tuple22将scala.Tuple1子类化。另外，全部情况类使用综合生成的方法实现Product。

由于我使用案例类对象作为函数的返回类型，因此我需要 scala.Product ，以便Spark的 createDataFrame 正确的重载。

为什么同时使用 ClassTag 和 TypeTag ？

通过删除 TypeTag ，编译器将引发以下错误：

错误：（96，48）T没有可用的TypeTag this.dataFrame = this.spark.createDataFrame（rdd）

并删除 ClassTag ：

错误：（95，35）T没有可用的ClassTag val rdd = this.getData.rdd.map（f = mapFunction）

添加它们使两种方法都令人满意，并且一切都按预期进行。

找到了good article，解释了Scala中的类型擦除。

传递带有任何案例类返回类型作为参数的函数

1 个答案: