我尝试实现Apache kafka和spark streaming Integration 这是我的python代码:
from __future__ import print_function
import sys
from pyspark.streaming import StreamingContext
from pyspark import SparkContext,SparkConf
from pyspark.streaming.kafka import KafkaUtils
if __name__ == "__main__":
#conf = SparkConf().setAppName("Kafka-Spark").setMaster("spark://127.0.0.1:7077")
conf = SparkConf().setAppName("Kafka-Spark")
#sc = SparkContext(appName="KafkaSpark")
sc = SparkContext(conf=conf)
stream=StreamingContext(sc,1)
map1={'demo':1}
kafkaStream = KafkaUtils.createStream(stream, 'localhost:2181', "test-consumer-group", map1)
# kafkaStream = KafkaUtils.createStream(stream, 'localhost:2181', "name", map1) #tried with localhost:2181 too
lines = kafkaStream.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()
stream.start()
stream.awaitTermination()
当我在程序上运行时,它会在终端上显示输出:
16/10/24 15:27:20错误执行者:阶段0.0(TID 0)中任务0.0的异常 java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class 在kafka.utils.Pool。(Pool.scala:28) 在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:91) 在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:143) at kafka.consumer.Consumer $ .create(ConsumerConnector.scala:94) 在org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100) 在org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149) 在org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131) 在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:597) 在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:587) 在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993) 在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993) 在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 在org.apache.spark.scheduler.Task.run(Task.scala:86) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:scala.collection.GenTraversableOnce $ class at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ......还有17个 16/10/24 15:27:20错误SparkUncaughtExceptionHandler:线程中的未捕获异常Thread [Executor task launch worker-0,5,main] java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class 在kafka.utils.Pool。(Pool.scala:28) 在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:91) 在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:143) at kafka.consumer.Consumer $ .create(ConsumerConnector.scala:94) 在org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100) 在org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149) 在org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131) 在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:597) 在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:587) 在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993) 在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993) 在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 在org.apache.spark.scheduler.Task.run(Task.scala:86) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:scala.collection.GenTraversableOnce $ class at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ......还有17个 16/10/24 15:27:20 INFO StreamingContext:从关闭钩子调用stop(stopGracefully = false) 16/10/24 15:27:20 WARN TaskSetManager:阶段0.0中失去的任务0.0(TID 0,localhost):java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class 在kafka.utils.Pool。(Pool.scala:28) 在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:91) 在kafka.consumer.ZookeeperConsumerConnector。(ZookeeperConsumerConnector.scala:143) at kafka.consumer.Consumer $ .create(ConsumerConnector.scala:94) 在org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100) 在org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149) 在org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131) 在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:597) 在org.apache.spark.streaming.scheduler.ReceiverTracker $ ReceiverTrackerEndpoint $$ anonfun $ 9.apply(ReceiverTracker.scala:587) 在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993) 在org.apache.spark.SparkContext $$ anonfun $ 33.apply(SparkContext.scala:1993) 在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) 在org.apache.spark.scheduler.Task.run(Task.scala:86) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:scala.collection.GenTraversableOnce $ class at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ......还有17个
16/10/24 15:27:20 ERROR TaskSetManager:阶段0.0中的任务0失败1次;
答案 0 :(得分:1)
Scala 2.10和2.11之间的集合API不同
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.6</version>
</dependency>