Apache Spark错误:无法将java.lang.invoke.SerializedLambda的实例分配给字段org.apache.spark.api.java.JavaRDD $$ anonfun $ filter $ 1.f $ 1

时间:2018-05-11 01:00:50

标签: java apache-spark

我是尝试Spark的新手,我做了一个非常简单的测试类:

JavaSparkContext jsc = new JavaSparkContext(new SparkConf().setAppName("Spark").setMaster("spark://xxx.xxx.xx.xxx:7077"));
JavaRDD<String> inputRDD = jsc.textFile("test.txt");

JavaRDD<String> ones = inputRDD.filter(s -> s.contains("1"));
for (String s : ones.collect()) {
  System.out.println(s);
}

spark群集在一个有1个活动worker的VM中运行,我从本地运行代码。我发现如果我在inputRDD上调用for循环,它会成功打印出test.txt的内容(每行中有一堆1和2)。如果我根本不调用for循环,代码也不会抱怨inputRDD.filter(s - &gt; s.contains(&#34; 1&#34;));并退出没关系。但是只要我对javaRDD做任何事情,就会失败:

  

java.lang.ClassCastException:无法将java.lang.invoke.SerializedLambda的实例分配给org.apache.spark类型的字段org.apache.spark.api.java.JavaRDD $$ anonfun $ filter $ 1.f $ 1。 api.java.function.Function in org.apache.spark.api.java.JavaRDD $$ anonfun $ filter $ 1           at java.io.ObjectStreamClass $ FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)           at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2283)           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)           在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)           在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)           在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)           在java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:426)           在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)           在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)           在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)           在org.apache.spark.scheduler.Task.run(Task.scala:109)           在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:345)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)           at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)           在java.lang.Thread.run(Thread.java:748)

[2018-05-10 17:53:37,511] ERROR [org.apache.spark.scheduler.TaskSetManager] - Task 1 in stage 0.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 192.168.22.240, executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.f$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaRDD$$anonfun$filter$1
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2283)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:426)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
        at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:361)
        at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
        at ixl.sparkClusterTest.SparkTest.execute(SparkTest.java:44)
        at ixl.conf.commandLine.RunCommandLineProgram.main(RunCommandLineProgram.java:136)
Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.f$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaRDD$$anonfun$filter$1
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2283)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2277)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2201)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2059)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1566)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:426)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

对于为什么会发生这种情况的任何建议都会有所帮助。 VM和本地都在Java 8上运行。

0 个答案:

没有答案