我想在Python中读取Kafka队列中的消息。例如,在Scala中,这很容易做到:
val ssc = new StreamingContext(conf, Seconds(20))
// Divide the topic into partitions
val topicMessages = "myKafkaTopic"
val topicMessagesMap = topicMessages.split(",").map((_, kafkaNumThreads)).toMap
val messages = KafkaUtils.createStream(ssc, zkQuorum, group, topicMessagesMap).map(_._2)
messages.foreachRDD { rdd =>
//...
}
我想在Python中做同样的事情。这是我目前的Python代码:
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
ssc = StreamingContext(sc, 20)
topicMessages = "myKafkaTopic"
topicMessagesMap = topicMessages.split(",").map((_, kafkaNumThreads)).toMap
messages = KafkaUtils.createStream(ssc, zkQuorum, group, topicMessagesMap)
但是我在topicMessagesMap = topicMessages.split(",").map((_, kafkaNumThreads)).toMap
行
AttributeError:'list'对象没有属性'map'
如何使此代码正常工作?
更新
如果我在Jupyter Notebook中运行此代码,则会出现以下错误:
messages = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {inputKafkaTopic: list})
在类路径中找不到Spark Streaming的Kafka库。试试吧 以下。
将Kafka库及其依赖项包括在中
提交命令 $ bin / spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.0.0 ...
- 醇>
从Maven Central http://search.maven.org/下载工件的JAR, Group Id = org.apache.spark,Artifact Id = spark-streaming-kafka-0-8-assembly,Version = 2.0.0。 然后,将spark包含在spark-submit命令中
$ bin / spark-submit --jars ...
我是否理解正确使用它的唯一方法是使用spark-submit
并且无法从Jupyter / IPython运行此代码?