使用python运行Spark Streaming作业但出现错误?

时间:2019-06-17 03:25:52

标签: pyspark

在节点上运行spark流作业,但失败了吗?

尝试了不同版本的kafka,spark仍然无法正常工作。 提交Spark作业时尝试了其他jar文件

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

sc = SparkContext(appName="PythonSparkStreamingKafka")
sc.setLogLevel("WARN")

ssc = StreamingContext(sc,20)

kafkaStream=KafkaUtils.createStream(ssc,"kmnn1.hdp.com:9092", 
"spark-streaming-consumer", {'test': 1})
lines = kafkaStream.map(lambda x: x[1])

counts = lines.flatMap(lambda line: line.split('')).map(lambda word: 
(word, 1)).reduceByKey(lambda a, b: a+b)

counts.pprint()
ssc.start()
ssc.awaitTermination()

使用的命令:

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars ./spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar spark-stream.py

预期结果

Input: hello world
Expected output : 'hello' 1
                  'world' 1

错误消息:

Exception in thread "Thread-6" java.lang.NoClassDefFoundError: 
kafka/common/TopicAndPartitionat java.lang.Class.getDeclaredMethods0(Native Method)at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)at java.lang.Class.privateGetPublicMethods(Class.java:2902)at java.lang.Class.getMethods(Class.java:1615)at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345)at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305)at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)at py4j.Gateway.invoke(Gateway.java:274)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)at py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:238)at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartitionat java.net.URLClassLoader.findClass(URLClassLoader.java:381)at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 12 moreERROR:root:Exception while sending command.Traceback (most recent call last):  File "/usr/hdp/3.1.0.0-78/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command    response = connection.send_command(command)  File "/usr/hdp/3.1.0.0-78/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command    "Error while receiving", e, proto.ERROR_ON_RECEIVE)Py4JNetworkError: Error while receivingTraceback (most recent call last):  File "/root/spark-code/spark-stream.py", line 13, in <module>    kafkaStream=KafkaUtils.createStream(ssc,"kmnn1.hdp.com:9092", "spark-streaming-consumer", {'test': 1})  File "/usr/hdp/3.1.0.0-78/spark2/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 79, in createStream  File "/usr/hdp/3.1.0.0-78/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__  File "/usr/hdp/3.1.0.0-78/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_valuepy4j.protocol.Py4JError: An error occurred while calling o58.createStream

0 个答案:

没有答案
相关问题