Spark Kafka流式传输无法确定分区的位置

时间:2019-08-27 04:31:27

标签: apache-spark apache-kafka spark-streaming kafka-consumer-api spark-streaming-kafka

我正在用Kafka创建一个Spark流媒体应用程序。

val kafkaParams = Map[String,Object](
        ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> kafkaConfig.bootstrapServers,
        ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
        ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
        ConsumerConfig.GROUP_ID_CONFIG -> "some_random_client",
        ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> kafkaConfig.offsetResetConfig,
        ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (true: java.lang.Boolean),
        ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG -> "120000",
        ConsumerConfig.DEFAULT_API_TIMEOUT_MS_CONFIG -> "120000"
      )

val dStream: DStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream[String,String](ssc,
        LocationStrategies.PreferConsistent,
        Subscribe[String, String](Array(kafkaConfig.topic), kafkaParams))

dStream.foreachRDD(rdd => {
        // COMPUTE
      })

不幸的是,由于无法确定特定position的{​​{1}},因此无法启动。我看到以下驱动程序日志-partition

这是expired before the position for partition <topic_name>-1 could be determined的输出:

kafka-consumer-groups.sh

我可以使用TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID topic_name 1 2244 2586 342 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81 topic_name 2 2506 2834 328 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81 topic_name 0 2695 3048 353 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81 topic_name 4 2587 2944 357 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81 topic_name 3 2249 2578 329 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81

使用上述主题的消息

0 个答案:

没有答案