我正在用Kafka创建一个Spark流媒体应用程序。
val kafkaParams = Map[String,Object](
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> kafkaConfig.bootstrapServers,
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
ConsumerConfig.GROUP_ID_CONFIG -> "some_random_client",
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> kafkaConfig.offsetResetConfig,
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> (true: java.lang.Boolean),
ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG -> "120000",
ConsumerConfig.DEFAULT_API_TIMEOUT_MS_CONFIG -> "120000"
)
val dStream: DStream[ConsumerRecord[String, String]] = KafkaUtils.createDirectStream[String,String](ssc,
LocationStrategies.PreferConsistent,
Subscribe[String, String](Array(kafkaConfig.topic), kafkaParams))
dStream.foreachRDD(rdd => {
// COMPUTE
})
不幸的是,由于无法确定特定position
的{{1}},因此无法启动。我看到以下驱动程序日志-partition
。
这是expired before the position for partition <topic_name>-1 could be determined
的输出:
kafka-consumer-groups.sh
我可以使用TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
topic_name 1 2244 2586 342 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81
topic_name 2 2506 2834 328 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81
topic_name 0 2695 3048 353 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81
topic_name 4 2587 2944 357 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81
topic_name 3 2249 2578 329 consumer-81-6f05cb98-2443-4301-b70f-d06a9385bfdc /10.66.242.213 consumer-81