我找不到文档,告诉我如何将spark-streaming-kafka-0-10_2.10与Python集成以将Kafka集成为Spark(https://spark.apache.org/docs/latest/streaming-kafka-integration.html)的输入源。不支持Python吗?
谢谢。
答案 0 :(得分:0)
完全支持。
请浏览
将JAR添加到PySpark会话的示例
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('test') \
.config('spark.jars.packages', 'org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0') \
.getOrCreate()
然后照常
import random
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
sc = SparkContext(appName='testIntegration')
ssc = StreamingContext(sc, 2)
topic = "topic-%d" % random.randint(0, 10000)
brokers = {"metadata.broker.list": "123.43.54.231:9092,123.43.54.235:9092,123.43.54.239:9092"}
stream = KafkaUtils.createDirectStream(ssc, [topic], brokers)
...
ssc.start()
ssc.awaitTermination()