我正在使用pyspark来使用kafka上的数据,我在我的控制台上输入以提交:
spark-submit --jars /Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar comsumer.py
其中consumer.py是我的python程序,然后在控制台中,它引发:
________________________________________________________________________________________________
Spark Streaming's Kafka libraries not found in class path. Try one of the following.
1. Include the Kafka library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.2.0 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.2.0.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...
________________________________________________________________________________________________
Traceback (most recent call last):
File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 51, in <module>
main()
File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 45, in main
main_main(ssc)
File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 29, in main_main
consumer = KafkaUtils.createStream(ssc, zookeeper, groupid, {kafkatopic: 1})
File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 69, in createStream
File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 195, in _get_helper
似乎告诉我,我没有指向jar文件的路径,但我看了一下日志信息,它有: enter image description here
18/01/27 19:46:59 INFO SparkContext: Added JAR file:/Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar at spark://192.168.1.150:57342/jars/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar with timestamp 1517053619142
18/01/27 19:46:59 INFO SparkContext: Added file file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py at file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py with timestamp 1517053619150
我确定jar文件在那里,为什么会有这个例外?
我不知道问题所在,你能帮帮我吗?
答案 0 :(得分:1)
它与pyspark的版本相对应,您必须确保此
spark-submit --jars /Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar comsumer.pynter code here
在
spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jarenter code here
必须使用相同的pyspark版本,在这种情况下,您使用的是pyspark=2.2.0
另一件事,我也遇到了这个问题,但是当我尝试使用--packages
时遇到了这个问题,也许您可以考虑使用
--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:{version of pyspark}
代替--jar
选项