pyspark kafka提交失败

时间:2018-01-27 11:32:01

标签: python pyspark apache-kafka kafka-consumer-api

我正在使用pyspark来使用kafka上的数据,我在我的控制台上输入以提交:

spark-submit --jars /Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar comsumer.py

其中consumer.py是我的python程序,然后在控制台中,它引发:

    ________________________________________________________________________________________________

  Spark Streaming's Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.2.0 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.2.0.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

________________________________________________________________________________________________


Traceback (most recent call last):
  File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 51, in <module>
    main()
  File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 45, in main
    main_main(ssc)
  File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 29, in main_main
    consumer = KafkaUtils.createStream(ssc, zookeeper, groupid, {kafkatopic: 1})
  File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 69, in createStream
  File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 195, in _get_helper

似乎告诉我,我没有指向jar文件的路径,但我看了一下日志信息,它有: enter image description here

    18/01/27 19:46:59 INFO SparkContext: Added JAR file:/Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar at spark://192.168.1.150:57342/jars/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar with timestamp 1517053619142
18/01/27 19:46:59 INFO SparkContext: Added file file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py at file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py with timestamp 1517053619150

我确定jar文件在那里,为什么会有这个例外?

我不知道问题所在,你能帮帮我吗?

1 个答案:

答案 0 :(得分:1)

它与pyspark的版本相对应,您必须确保此

spark-submit --jars /Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar comsumer.pynter code here

spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jarenter code here

必须使用相同的pyspark版本,在这种情况下,您使用的是pyspark=2.2.0

另一件事,我也遇到了这个问题,但是当我尝试使用--packages时遇到了这个问题,也许您可​​以考虑使用

--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:{version of pyspark}

代替--jar选项