没有与找到Base的Serializable的Product对应的Java类

时间:2016-05-29 12:45:13

标签: java scala apache-spark rdd apache-spark-dataset

我写了two case class,扩展了基地abstract class。我有两个班级列表(listAlistB)。当我想合并这两个列表时,我无法将最终列表转换为Apache Spark 1.6.1数据集。

abstract class Base

case class A(name: String) extends Base
case class B(age: Int) extends Base

val listA: List[A] = A("foo")::A("bar")::Nil
val listB: List[B] = B(10)::B(20)::Nil
val list: List[Base with Product with Serializable] = listA ++ listB

val result: RDD[Base with Product with Serializable] = sc.parallelize(list).toDS()

Apache Spark将引发此异常:

A needed class was not found. This could be due to an error in your runpath. Missing class: no Java class corresponding to Base with Product with Serializable found
java.lang.NoClassDefFoundError: no Java class corresponding to Base with Product with Serializable found
    at scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1299)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:192)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:54)
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:50)
    at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:41)

当我想从list创建RDD时,Spark不会抛出任何异常,但是当我使用toDS()方法将RDD转换为数据集时,此先前异常将抛出。

1 个答案:

答案 0 :(得分:3)

首先,您可以通过明确地将list设为List[Base],或者通过添加Base extends Product with Serializable来获得Base的更合理的类型,如果其目的是仅通过案例类/对象进行扩展。但这还不够,因为

  

Spark 1.6 comes with support for automatically generating encoders for a wide variety of types, including primitive types (e.g. String, Integer, Long), Scala case classes, and Java Beans.

请注意,不支持像kryo这样的抽象类。并且也不支持自定义编码器。虽然您可以尝试使用javaSerialization(或abstract class Base extends Serializable with Product case class A(name: String) extends Base case class B(age: Int) extends Base object BaseEncoder { implicit def baseEncoder: org.apache.spark.Encoder[Base] = org.apache.spark.Encoders.kryo[Base] } val listA: Seq[A] = Seq(A("a"), A("b")) val listB: Seq[B] = Seq(B(1), B(2)) val list: Seq[Base] = listA ++ listB val ds = sc.parallelize(list).toDS 作为最后的手段)编码器,但请参阅How to store custom objects in Dataset?

以下是完整的工作示例:

java.util.Collections.sort(list, (a,b)-> 
    a.type != b.type ? a.type - b.type : a.name.compareTo(b.name);)