使用星型流从kafka主题中消费时消息丢失,下面是我用来从主题中消费数据的代码。
代码:
val kafkaParam = new mutable.HashMap[String, String]()
kafkaParam.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "<bootstrap-servers>")
kafkaParam.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
kafkaParam.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
kafkaParam.put(ConsumerConfig.GROUP_ID_CONFIG, "<group-id>")
kafkaParam.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
kafkaParam.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true")
kafkaParam.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "35000")
kafkaParam.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL")
kafkaParam.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, "<truststore-path")
kafkaParam.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, "<password>")
kafkaParam.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, "<keystore-path>")
kafkaParam.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, "<password>")
val topicList = List("<topic-name>")
val messageStream = KafkaUtils.createDirectStream(sparkStreamingContext, LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String, String](topicList, kafkaParam))
val sparkStreamingContext = new StreamingContext(sc, Durations.seconds(30))
val streamData = spark.read.schema(schemaNeeded).json(rdd.map(x => x.value()))
val addNewColumn = streamData.withColumn("batch_load_time", lit(partitionTime)).withColumn("batch_load_date", lit(partitionDate).cast("String"))
addNewColumn.write.mode("Append").insertInto("<hive-partitioned-table>")
请帮助我解决此问题。