我试图将一些常见代码提取到抽象类中,但遇到了问题。
让我们说我正在阅读一个格式为" id | name"的文件:
case class Person(id: Int, name: String) extends Serializable
object Persons {
def apply(lines: Dataset[String]): Dataset[Person] = {
import lines.sparkSession.implicits._
lines.map(line => {
val fields = line.split("\\|")
Person(fields(0).toInt, fields(1))
})
}
}
Persons(spark.read.textFile("persons.txt")).show()
大。这很好用。现在让我们说我想用" name"来阅读许多不同的文件。字段,所以我将提取出所有常见的逻辑:
trait Named extends Serializable { val name: String }
abstract class NamedDataset[T <: Named] {
def createRecord(fields: Array[String]): T
def apply(lines: Dataset[String]): Dataset[T] = {
import lines.sparkSession.implicits._
lines.map(line => createRecord(line.split("\\|")))
}
}
case class Person(id: Int, name: String) extends Named
object Persons extends NamedDataset[Person] {
override def createRecord(fields: Array[String]) =
Person(fields(0).toInt, fields(1))
}
这失败了两个错误:
Error:
Unable to find encoder for type stored in a Dataset.
Primitive types (Int, String, etc) and Product types (case classes)
are supported by importing spark.implicits._ Support for serializing
other types will be added in future releases.
lines.map(line => createRecord(line.split("\\|")))
Error:
not enough arguments for method map:
(implicit evidence$7: org.apache.spark.sql.Encoder[T])org.apache.spark.sql.Dataset[T].
Unspecified value parameter evidence$7.
lines.map(line => createRecord(line.split("\\|")))
我觉得这与implicits,TypeTags和/或ClassTags有关,但我刚开始使用Scala并且还没有完全理解这些概念。
答案 0 :(得分:7)
你必须进行两处小改动:
Product
(作为错误消息状态),因此使Named
特征Serializable
不够。你应该扩展Product
(这意味着案例类和元组可以扩展它)ClassTag
和TypeTag
来克服类型擦除并找出实际类型所以 - 这是一个有效的版本:
import scala.reflect.ClassTag
import scala.reflect.runtime.universe.TypeTag
trait Named extends Product { val name: String }
abstract class NamedDataset[T <: Named : ClassTag : TypeTag] extends Serializable {
def createRecord(fields: Array[String]): T
def apply(lines: Dataset[String]): Dataset[T] = {
import lines.sparkSession.implicits._
lines.map(line => createRecord(line.split("\\|")))
}
}
case class Person(id: Int, name: String) extends Named
object Persons extends NamedDataset[Person] {
override def createRecord(fields: Array[String]) =
Person(fields(0).toInt, fields(1))
}