Spark Streaming - java.lang.ClassNotFoundException Scala匿名函数

时间:2017-03-01 12:30:40

标签: scala apache-spark spark-streaming

我一直在尝试使用spark-submit向我的集群提交一个Spark Streaming应用程序,该集群由一个主节点和两个工作节点组成。该应用程序已使用Scala编写,并使用Maven构建。重要的是,Maven构建配置为生成包含所有依赖项的胖JAR。此外,JAR已分发给所有节点。已使用以下命令提交流式传输作业:

bin/spark-submit --class topology.SimpleProcessingTopology --jars /tmp/spark_streaming-1.0-SNAPSHOT.jar --master spark://10.0.0.8:7077 --verbose /tmp/spark_streaming-1.0-SNAPSHOT.jar /tmp/streaming-benchmark.properties 

其中10.0.0.8是VNET内主节点的IP地址。

但是,在启动流应用程序时,我一直收到以下异常:

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)

Caused by: java.lang.ClassNotFoundException: topology.SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)

我已经使用jar tvf检查了JAR的内容,正如您在下面的输出中所看到的,它确实包含了相关的类。

 1735 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$$anonfun$main$1.class
   702 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology.class
  2415 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.class
  2500 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1.class
  7045 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$.class

此异常是由foreachPartition调用的匿名函数引起的:

rdd.foreachPartition(partition => {
      val outTopic = props.getString("application.simple.kafka.out.topic")
      val producer = new KafkaProducer[Array[Byte],Array[Byte]](kafkaParams)
      partition.foreach(record => {
        val producerRecord = new ProducerRecord[Array[Byte], Array[Byte]](outTopic, record.key(), record.value())
        producer.send(producerRecord)
      })
      producer.close()
    })

不幸的是,到目前为止,我无法找到这个的根本原因。因此,如果有人能帮助我解决这个问题,我将不胜感激。

0 个答案:

没有答案