java.lang.ClassCastException:无法将java.lang.invoke.SerializedLambda的实例分配给字段org.apache.spark.api.java.JavaPairRDD

时间:2018-10-31 04:25:42

标签: apache-spark

以下是我的简单代码。当我在Spark Local模式下运行时,它可以完美运行。但是,当我尝试以1个驱动程序和1个工作程序在群集模式下运行它时,出现以下异常。

我尝试了在一些答案中提到的setJars,但是它没有帮助我。

public static void main(String[] args) throws IOException {

        SparkConf conf = new SparkConf().setAppName("example.ClusterPractice").setMaster("spark://192.168.42.18:7077");
        conf.setJars(new String[]{"E:\\Eclipses\\neon new projects\\eclipse\\neon new projects\\spark-practice\\out\\artifacts\\spark_practice_jar\\spark-practice.jar"});

        JavaSparkContext sc = new JavaSparkContext(conf);

        JavaRDD<Integer> numbers = sc.parallelize(Arrays.asList(1, 2, 3));

        System.out.println("Reduce");
        long total = numbers.reduce((n1,n2)-> n1+n2);
        System.out.println(total);
    }

我得到的异常如下:

  

驱动程序堆栈跟踪:       在org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1602)       在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1590)       在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1589)       在scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)       在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)       在org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589)       位于org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:831)       位于org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:831)       在scala.Option.foreach(Option.scala:257)       在org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)       在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823)       在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)       在org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)       在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48)       在org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)       在org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)       在org.apache.spark.SparkContext.runJob(SparkContext.scala:2131)       在org.apache.spark.rdd.RDD $$ anonfun $ reduce $ 1.apply(RDD.scala:1029)       在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)       在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)       在org.apache.spark.rdd.RDD.withScope(RDD.scala:363)       在org.apache.spark.rdd.RDD.reduce(RDD.scala:1011)       在org.apache.spark.api.java.JavaRDDLike $ class.reduce(JavaRDDLike.scala:385)       在org.apache.spark.api.java.AbstractJavaRDDLike.reduce(JavaRDDLike.scala:45)       在example.ClusterPractice.main(ClusterPractice.java:22)   原因:java.lang.ClassCastException:无法将java.lang.invoke.SerializedLambda实例分配给org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction2 $ 1.fun $ 2类型的字段org.apache.spark。 org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction2 $ 1实例中的api.java.function.Function2       在java.io.ObjectStreamClass $ FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)       在java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)       在java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)       在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)       在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)       在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)       在java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)       在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)       在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)       在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)       在org.apache.spark.scheduler.Task.run(Task.scala:109)       在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)       在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)       在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)       在java.lang.Thread.run(Thread.java:748)

2 个答案:

答案 0 :(得分:2)

您可以找到问题here的详细答案

您似乎正在卸下使用

设置的罐子

conf.setJars(new String[]{"E:\\Eclipses\\neon new projects\\eclipse\\neon new projects\\spark-practice\\out\\artifacts\\spark_practice_jar\\spark-practice.jar"});

从此行的配置中

conf.setJars(new String[]{""});

删除此行,它将起作用。

答案 1 :(得分:0)

以上程序运行正常。

问题出在建造罐子上。因此,不要怀疑该程序仅关注jar是否正确构建。

就我而言,我正在使用Intellij。我正在通过build选项进行构建工件,我认为由于jar是Maven项目而无法正确构建。

所以,当我做Maven build jar时,它的构建正确,程序运行顺利。