Spark / scala使用特征中的泛型创建空数据集

时间:2017-12-05 00:10:26

标签: scala apache-spark scala-reflect

我有一个名为带有类型参数的特征,其中一个方法需要能够创建一个空的类型化数据集。

use \App\Http\Controllers\ControllerName;

到目前为止,我还没有开始工作。它抱怨trait MyTrait[T] { val sparkSession: SparkSession val spark = sparkSession.session val sparkContext = spark.sparkContext def createEmptyDataset(): Dataset[T] = { import spark.implicits._ // to access .toDS() function // DOESN'T WORK. val emptyRDD = sparkContext.parallelize(Seq[T]()) val accumulator = emptyRDD.toDS() ... } } no ClassTag for T

任何帮助将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:6)

您必须在同一范围内同时提供ClassTag[T]Encoder[T]。例如:

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}
import scala.reflect.ClassTag


trait MyTrait[T] {
    val ct: ClassTag[T]
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        val emptyRDD = sparkContext.emptyRDD[T](ct)
        spark.createDataset(emptyRDD)(enc)
    }
}

具体实施:

class Foo extends MyTrait[Int] {
   val sparkSession = SparkSession.builder.getOrCreate()
   import sparkSession.implicits._

   val ct = implicitly[ClassTag[Int]]
   val enc = implicitly[Encoder[Int]]
}

可以跳过RDD

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}

trait MyTrait[T] {
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        spark.emptyDataset[T](enc)
    }
}

How to declare traits as taking implicit "constructor parameters"?检查answerBlaisorbladeanother oneAlexey Romanov {/ 3}}。