Dataset.map()引发ClassCastException

时间:2019-06-18 14:00:32

标签: java apache-spark-sql classcastexception

我正尝试使用map函数遍历数据集,返回元素,而无需对新变量进行任何更改。然后调用collect方法。我得到了类强制转换Exception.ClassCastException。我想念什么?

def fun() {
   val df = Seq(Person("Max", 33), 
                Person("Adam", 32), 
                Person("Muller", 62)).toDF()

   val encoderPerson = Encoders.product[Person]

   val personDS: Dataset[Person] = df.as[Person](encoderPerson)

   val newPersonDS = personDS.map { iter2 => iter2}

   newPersonDS.collect()
}


case class Person(name: String, age: Int)

java.lang.ClassCastException:com.query.Person无法转换为com.query.Person     在com.query.MyClass $$ anonfun $ 1.apply(MyClass.scala:42)     在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage3.mapelements_doConsume_0 $(未知来源)     在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage3.deserializetoobject_doConsume_0 $(未知来源)     在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage3.agg_doAggregateWithKeysOutput_0 $(未知来源)     在org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIteratorForCodegenStage3.processNext(未知来源)     在org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)     在org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 10 $$ anon $ 1.hasNext(WholeStageCodegenExec.scala:614)     在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 2.apply(SparkPlan.scala:253)     在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 2.apply(SparkPlan.scala:247)     在org.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsInternal $ 1 $$ anonfun $ apply $ 25.apply(RDD.scala:830)     在org.apache.spark.rdd.RDD $$ anonfun $ mapPartitionsInternal $ 1 $$ anonfun $ apply $ 25.apply(RDD.scala:830)     在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:288)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)     在org.apache.spark.scheduler.Task.run(Task.scala:109)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

0 个答案:

没有答案