spark-streaming-kafka-0-10_2.10是否适用于Python?

时间:2019-07-09 21:55:59

标签: python apache-spark pyspark apache-kafka spark-streaming

我找不到文档,告诉我如何将spark-streaming-kafka-0-10_2.10与Python集成以将Kafka集成为Spark(https://spark.apache.org/docs/latest/streaming-kafka-integration.html)的输入源。不支持Python吗?

谢谢。

1 个答案:

答案 0 :(得分:0)

完全支持。

请浏览

  1. pyspark documentation
  2. spark streaming/kafka integration
  3. how to deploy for python(Kafka 0.10)

将JAR添加到PySpark会话的示例

    from pyspark.sql import SparkSession

    spark = SparkSession.builder.appName('test') \
        .config('spark.jars.packages', 'org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0') \
        .getOrCreate()

然后照常

    import random

    from pyspark import SparkContext
    from pyspark.streaming import StreamingContext
    from pyspark.streaming.kafka import KafkaUtils

    sc = SparkContext(appName='testIntegration')
    ssc = StreamingContext(sc, 2)

    topic = "topic-%d" % random.randint(0, 10000)
    brokers = {"metadata.broker.list": "123.43.54.231:9092,123.43.54.235:9092,123.43.54.239:9092"}
    stream = KafkaUtils.createDirectStream(ssc, [topic], brokers)

    ...

    ssc.start()
    ssc.awaitTermination()