java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class

时间:2016-10-24 10:36:19

标签: python-2.7 apache-spark apache-kafka pyspark

我尝试实现Apache kafka和spark streaming Integration 这是我的python代码:

from __future__ import print_function
import sys
from pyspark.streaming import StreamingContext
from pyspark import SparkContext,SparkConf
from pyspark.streaming.kafka import KafkaUtils

if __name__ == "__main__":
#conf = SparkConf().setAppName("Kafka-Spark").setMaster("spark://127.0.0.1:7077")
conf = SparkConf().setAppName("Kafka-Spark")
#sc = SparkContext(appName="KafkaSpark")
sc = SparkContext(conf=conf)
stream=StreamingContext(sc,1)
map1={'demo':1}
kafkaStream = KafkaUtils.createStream(stream, 'localhost:2181', "test-consumer-group", map1)

# kafkaStream = KafkaUtils.createStream(stream, 'localhost:2181', "name", map1) #tried with localhost:2181 too
lines = kafkaStream.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) 
     .map(lambda word: (word, 1)) \
     .reduceByKey(lambda a, b: a+b)
counts.pprint()

stream.start()
stream.awaitTermination()

当我在程序上运行时,它会在终端上显示输出:

16/10/24 15:27:20错误执行者:阶段0.0(TID 0)中任务0.0的异常 java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class     在kafka.utils.Pool。(Pool.scala:28)     在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:91)     在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:143)     at kafka.consumer.Consumer $ .create(ConsumerConnector.scala:94)     在org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)     在org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)     在org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)     在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:597)     在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:587)     在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993)     在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)     在org.apache.spark.scheduler.Task.run(Task.scala:86)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:scala.collection.GenTraversableOnce $ class     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)     at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:331)     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)     ......还有17个 16/10/24 15:27:20错误SparkUncaughtExceptionHandler:线程中的未捕获异常Thread [Executor task launch worker-0,5,main] java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class     在kafka.utils.Pool。(Pool.scala:28)     在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:91)     在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:143)     at kafka.consumer.Consumer $ .create(ConsumerConnector.scala:94)     在org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)     在org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)     在org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)     在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:597)     在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:587)     在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993)     在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)     在org.apache.spark.scheduler.Task.run(Task.scala:86)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:scala.collection.GenTraversableOnce $ class     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)     at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:331)     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)     ......还有17个 16/10/24 15:27:20 INFO StreamingContext:从关闭钩子调用stop(stopGracefully = false) 16/10/24 15:27:20 WARN TaskSetManager:阶段0.0中失去的任务0.0(TID 0,localhost):java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class     在kafka.utils.Pool。(Pool.scala:28)     在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:91)     在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:143)     at kafka.consumer.Consumer $ .create(ConsumerConnector.scala:94)     在org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)     在org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)     在org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)     在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:597)     在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:587)     在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993)     在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)     在org.apache.spark.scheduler.Task.run(Task.scala:86)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:scala.collection.GenTraversableOnce $ class     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)     at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:331)     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)     ......还有17个

16/10/24 15:27:20 ERROR TaskSetManager:阶段0.0中的任务0失败1次;

1 个答案:

答案 0 :(得分:1)

Scala 2.10和2.11之间的集合API不同

<dependency>
                     <groupId>org.scala-lang</groupId>
                     <artifactId>scala-library</artifactId>
                     <version>2.10.6</version>
  </dependency>