我正在尝试运行一个在spark 1.3.1中完美运行的Python spark脚本。 我已经下载了spark 1.4并尝试运行脚本,但它一直在贬低
在类路径中找不到Spark Streaming的Kafka库。试试吧 以下。
在spark-submit命令中包含Kafka库及其依赖项
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.4.0 ...
- 醇>
从Maven Central http://search.maven.org/下载工件的JAR,Group Id = org.apache.spark,Artifact Id = spark-streaming-kafka-assembly,Version = 1.4.0。然后,将spark包含在spark-submit命令中
$ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...
我在提交命令中明确引用了jar,并将jar添加为
/opt/spark/spark-1.4.0-bin-hadoop2.6/bin/spark-submit --jars spark-streaming_2.10-1.4.0.jar,spark-core_2.10-1.4.0.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar,kafka_2.10-0.8.2.1.jar,kafka-clients-0.8.2.1.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar /root/SparkPySQLNew.py
它还说它在应用程序启动时添加了它们,为什么它找不到它们?
15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-streaming_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming_2.10-1.4.0.jar with timestamp 1436334277792
15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-core_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-core_2.10-1.4.0.jar with timestamp 1436334277919
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278295
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka_2.10-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka_2.10-0.8.2.1.jar with timestamp 1436334278353
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka-clients-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka-clients-0.8.2.1.jar with timestamp 1436334278357
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278665
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar with timestamp 1436334278666
而且我知道我已添加了大量的内容,我开始使用其中一个,然后最后将它们全部添加到最后。
答案 0 :(得分:0)
我怀疑确切的答案因spark版本不同而异,但是基于this HCC thread,以下似乎可以解决其他问题:
spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar
乍一看,不同之处在于它有1个spark-streaming-kafka-assembly jar,而您要提交两个。