如何避免错误消息java.lang.ClassCastException spark sql 2.3.1

时间:2018-11-24 18:36:16

标签: apache-spark apache-spark-sql apache-spark-ml

我有一些代码行可以对数据集进行预处理:

val clean_data = resultDf.na.replace("verkehrsstatus", Map("aktuell nicht ermittelbar" -> "normales Verkehrsaufkommen"))

val datawithudf = clean_data.withColumn("state", udfState()($"verkehrsstatus"))
val finaldata = datawithudf.select($"auswertezeit", $"strecke_id", $"state", $"geschwindigkeit", $"coordinates").withColumnRenamed("state", "verkehrsstatus")
finaldata.printSchema()
finaldata.take(2).foreach(println)

当我想显示来自最终DataFrame的一些示例记录时,我收到此消息错误:

  

WARN scheduler.TaskSetManager:在阶段2.0中丢失任务0.0(TID 2,192.168.56.102,执行器0):java.lang.ClassCastException:无法将scala.collection.immutable.List $ SerializationProxy实例分配给字段org.apache在org.apache.spark.rdd.MapPartitionsRDD实例中,类型为scala.collection.Seq的.spark.rdd.RDD.org $ apache $ spark $ rdd $ RDD $$ dependencies_       在java.io.ObjectStreamClass $ FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)       在java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1417)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2293)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)       在java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)       在scala.collection.immutable.List $ SerializationProxy.readObject(List.scala:479)       在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处       在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)       在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)       在java.lang.reflect.Method.invoke(Method.java:498)       在java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)       在java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)       在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)       在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)       在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)       在org.apache.spark.scheduler.Task.run(Task.scala:109)       在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)       在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)       在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)       在java.lang.Thread.run(Thread.java:748)

0 个答案:

没有答案