Spark 1.4缺少Kafka库

时间:2015-07-08 05:52:09

标签: hadoop apache-spark apache-kafka spark-streaming hortonworks-data-platform

我正在尝试运行一个在spark 1.3.1中完美运行的Python spark脚本。 我已经下载了spark 1.4并尝试运行脚本,但它一直在贬低

  

在类路径中找不到Spark Streaming的Kafka库。试试吧   以下。

     
      
  1. 在spark-submit命令中包含Kafka库及其依赖项

    $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.4.0 ...
    
  2.   
  3. 从Maven Central http://search.maven.org/下载工件的JAR,Group Id = org.apache.spark,Artifact Id = spark-streaming-kafka-assembly,Version = 1.4.0。然后,将spark包含在spark-submit命令中

    $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...
    
  4.   

我在提交命令中明确引用了jar,并将jar添加为

/opt/spark/spark-1.4.0-bin-hadoop2.6/bin/spark-submit --jars spark-streaming_2.10-1.4.0.jar,spark-core_2.10-1.4.0.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar,kafka_2.10-0.8.2.1.jar,kafka-clients-0.8.2.1.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar /root/SparkPySQLNew.py

它还说它在应用程序启动时添加了它们,为什么它找不到它们?

15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-streaming_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming_2.10-1.4.0.jar with timestamp 1436334277792
15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-core_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-core_2.10-1.4.0.jar with timestamp 1436334277919
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278295
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka_2.10-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka_2.10-0.8.2.1.jar with timestamp 1436334278353
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka-clients-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka-clients-0.8.2.1.jar with timestamp 1436334278357
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278665
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar with timestamp 1436334278666               

而且我知道我已添加了大量的内容,我开始使用其中一个,然后最后将它们全部添加到最后。

1 个答案:

答案 0 :(得分:0)

我怀疑确切的答案因spark版本不同而异,但是基于this HCC thread,以下似乎可以解决其他问题:

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar 

乍一看,不同之处在于它有1个spark-streaming-kafka-assembly jar,而您要提交两个。

相关问题