为什么在通过spark-submit

时间:2019-07-09 13:39:46

标签: apache-spark pyspark apache-kafka

我正在通过此命令运行脚本

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 direct_kafka_wordcount.py localhost 9092 

我无法连接我的Kafka主题和检索信息。我已经尝试了一切,但没有运气。我正在运行我的实时Kafka流的单词计数的简单代码。

  

常春藤默认缓存设置为:/home/sagar/.ivy2/cache       用于存储在以下位置的软件包的罐子:/home/sagar/.ivy2/jars       ::加载设置:: URL = jar:文件:/usr/local/spark-2.4.3-bin-hadoop2.7/jars/ivy-2.4.0.jar!/ org / apache / ivy / core / settings / ivysettings.xml       org.apache.spark#spark-streaming-kafka-0-10_2.11添加为依赖项       ::解决依赖关系:: org.apache.spark#spark-submit-parent-be411cc2-fb3f-4049-b222-e3eca55e020b; 1.0         confs:[默认]         在中央发现org.apache.spark#spark-streaming-kafka-0-10_2.11; 2.2.0         在中央发现org.apache.kafka#kafka_2.11; 0.10.0.1         在中央发现com.101tec#zkclient; 0.8         在中央发现org.slf4j#slf4j-api; 1.7.16         在中央找到org.slf4j#slf4j-log4j12; 1.7.16         在中央找到log4j#log4j; 1.2.17         在中央找到com.yammer.metrics#metrics-core; 2.2.0         在中央发现了org.scala-lang.modules#scala-parser-combinators_2.11; 1.0.4         在中央发现org.apache.kafka#kafka-clients; 0.10.0.1         在中央发现net.jpountz.lz4#lz4; 1.3.0         在中央发现org.xerial.snappy#snappy-java; 1.1.2.6         在中央发现org.spark-project.spark#unused; 1.0.0       ::解析报告::解决1491ms ::工件dl 9ms         ::正在使用的模块:         com.101tec#zkclient; 0.8 from Central in [默认]         com.yammer.metrics#metrics-core; 2.2.0 from Central in [默认]         log4j#log4j; 1.2.17 from Central in [默认]         net.jpountz.lz4#lz4; 1.3.0 from Central in [默认]         org.apache.kafka#kafka-clients; 0.10.0.1 from Central in [默认]         org.apache.kafka#kafka_2.11; 0.10.0.1从Central进入[默认]         org.apache.spark#spark-streaming-kafka-0-10_2.11; 2.2.0 from Central in [默认]         org.scala-lang.modules#scala-parser-combinators_2.11; 1.0.4 from Central in [默认]         org.slf4j#slf4j-api; 1.7.16从Central进入[默认]         org.slf4j#slf4j-log4j12; 1.7.16,位于[默认]中         org.spark-project.spark#unused; 1.0.0 from Central in [默认]         org.xerial.snappy#snappy-java; 1.1.2.6 from Central in [默认]         -------------------------------------------------- -------------------         | |模块||文物|         | conf |编号|搜索|发现|逐出||编号|         -------------------------------------------------- -------------------         |默认值12 | 1 | 1 | 0 || 12 | 0 |         -------------------------------------------------- -------------------       ::检索:: org.apache.spark#spark-submit-parent-be411cc2-fb3f-4049-b222-e3eca55e020b         confs:[默认]         复制了0个工件,已经检索了12个工件(0kB / 8ms)       19/07/09 14:28:08 WARN NativeCodeLoader:无法使用内置Java类为您的平台加载本机Hadoop库。   适用时       追溯(最近一次通话):         文件“ /usr/local/spark-2.4.3-bin-hadoop2.7/examples/src/main/python/streaming/direct_kafka_wordcount.py”,   第48行,在           kvs = KafkaUtils.createDirectStream(ssc,[topic],{“ metadata.broker.list”:经纪人})         文件“ /usr/local/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py”,   createDirectStream中的第146行         文件“ /usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,   第1257行,在致电中         文件“ /usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,   第328行,位于get_return_value中       py4j.protocol.Py4JJavaError:调用o26.createDirectStreamWithoutMessageHandler时发生错误。       :org.apache.spark.SparkException:代理格式不正确:[localhost]         在org.apache.spark.streaming.kafka.KafkaCluster $ SimpleConsumerConfig $$ anonfun $ 7.apply(KafkaCluster.scala:390)         在org.apache.spark.streaming.kafka.KafkaCluster $ SimpleConsumerConfig $$ anonfun $ 7.apply(KafkaCluster.scala:387)         在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)         在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)         在scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)         在scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)         在scala.collection.TraversableLike $ class.map(TraversableLike.scala:234)         在scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186)         在org.apache.spark.streaming.kafka.KafkaCluster $ SimpleConsumerConfig。(KafkaCluster.scala:387)         在org.apache.spark.streaming.kafka.KafkaCluster $ SimpleConsumerConfig $ .apply(KafkaCluster.scala:422)         在org.apache.spark.streaming.kafka.KafkaCluster.config(KafkaCluster.scala:53)         在org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:130)         在org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:119)         在org.apache.spark.streaming.kafka.KafkaUtils $ .getFromOffsets(KafkaUtils.scala:211)         在org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStream(KafkaUtils.scala:720)         在org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper.createDirectStreamWithoutMessageHandler(KafkaUtils.scala:688)         在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处         在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         在java.lang.reflect.Method.invoke(Method.java:498)         在py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)         在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)         在py4j.Gateway.invoke(Gateway.java:282)         在py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)         在py4j.commands.CallCommand.execute(CallCommand.java:79)         在py4j.GatewayConnection.run(GatewayConnection.java:238)         在java.lang.Thread.run(Thread.java:748)

1 个答案:

答案 0 :(得分:0)

语法错误,请尝试以下操作(检查kafka代理主机部分):

console.trace
(anonymous) @ VM89644:4
setTimeout (async)
func @ VM89644:3
(anonymous) @ VM89644:6
setTimeout (async)
func @ VM89644:3
(anonymous) @ VM89644:6
setTimeout (async)
func @ VM89644:3
(anonymous) @ VM89644:6
setTimeout (async)
func @ VM89644:3
(anonymous) @ VM89644:6
setTimeout (async)
func @ VM89644:3
(anonymous) @ VM89644:6
setTimeout (async)
func @ VM89644:3
(anonymous) @ VM89644:6
...
...
...

一般而言,连接到kafka的引导服务器始终需要使用spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 direct_kafka_wordcount.py localhost:9092 语法。