如何在PySpark中创建具有偏移量的DStream(使用KafkaUtils.createDirectStream)?卡夫卡什么时候是集​​群?

时间:2018-10-31 07:07:19

标签: apache-spark pyspark apache-kafka spark-streaming

在HDFS中使用KafkaUtils.createDirectStream存储偏移量

offsetRanges = []
def storeOffsetRanges(rdd):
    global offsetRanges
    offsetRanges = rdd.offsetRanges()
    return rdd

def printOffsetRanges(rdd):
    for o in offsetRanges:
        print("topic: %s\n partition: %s\n fromOffset: %s\n untilOffset: %s\n" % (o.topic, o.partition, o.fromOffset, o.untilOffset))

offsetRange,如:

OffsetRange(topic: pdns, partition: 21, range: [248025782 -> 248025782]
topic: pdns
 partition: 21
 fromOffset: 248025782
 untilOffset: 248025782

OffsetRange(topic: pdns, partition: 4, range: [248016485 -> 248016485]
topic: pdns
 partition: 4
 fromOffset: 248016485
 untilOffset: 248016485

OffsetRange(topic: pdns, partition: 9, range: [247995083 -> 247995083]
topic: pdns
 partition: 9
 fromOffset: 247995083
 untilOffset: 247995083

我想使用offsetRange在KafkaUtils.createDirectStream中设置fromoffset,如何在Kafka群集时如何设置

direct_kafka_stream = KafkaUtils.createDirectStream(
        ssc=ssc,
        topics=topic_name,
        kafkaParams={
            "metadata.broker.list": brokers,
            "group.id": consumer_id
        },
        fromOffsets= 
    )

  ssc=ssc,
        topics=topic_name,
        kafkaParams={
            "metadata.broker.list": brokers,
            "group.id": consumer_id
        },
        fromOffsets= 
    )

0 个答案:

没有答案