运行StreamingContext.start()时发生异常

时间:2019-05-03 06:44:44

标签: pyspark apache-kafka

在Windows 10中运行python代码时出现异常。我正在使用Apache Kafka和PySpark。

从Kafka读取数据的Python代码段

ssc=StreamingContext(sc,60)
zkQuorum, topic = sys.argv[1:]       
kvs=KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: [x[0],x[1]])
lines.pprint()
lines.foreachRDD(SaveRecord)
ssc.start()
ssc.awaitTermination()

运行代码时发生异常

Exception in thread "streaming-start" java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
            at org.apache.spark.streaming.kafka.KafkaReceiver.<init>(KafkaInputDStream.scala:69)
            at org.apache.spark.streaming.kafka.KafkaInputDStream.getReceiver(KafkaInputDStream.scala:60)
            at org.apache.spark.streaming.scheduler.ReceiverTracker.$anonfun$launchReceivers$1(ReceiverTracker.scala:441)
            at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
            at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
            at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
            at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
            at scala.collection.TraversableLike.map(TraversableLike.scala:237)
            at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
            at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
            at org.apache.spark.streaming.scheduler.ReceiverTracker.launchReceivers(ReceiverTracker.scala:440)
            at org.apache.spark.streaming.scheduler.ReceiverTracker.start(ReceiverTracker.scala:160)
            at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:102)
            at org.apache.spark.streaming.StreamingContext.$anonfun$start$1(StreamingContext.scala:583)
            at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
            at org.apache.spark.util.ThreadUtils$$anon$1.run(ThreadUtils.scala:145)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
            at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            ... 16 more

1 个答案:

答案 0 :(得分:0)

这可能是由于Scala与Spark的版本不兼容。确保项目配置中的Scala版本与Spark版本支持的版本匹配。 Spark需要Scala 2.12;在Spark 3.0.0中删除了对Scala 2.11的支持

第三方jar(例如用于Twitter流应用程序的dstream-twitter或您的Kafka流jar)也有可能是为应用程序中不受支持的Scala版本而构建的。

对我来说dstream-twitter_2.11-2.3.0-SNAPSHOT对于实例不适用于Spark 3.0,它在线程“ streaming-start” java.lang.NoClassDefFoundError中给出了异常:org / apache / spark / internal / Logging $ class)。但是,当我使用scala 2.12版本更新dtream-twitter jar时,它解决了该问题。

确保正确获得所有Scala版本。