我是引发kafka的结构化流和偏移管理的新手。 使用spark-streaming-kafka-0-10-2.11。 在消费者中,如何从主题的特定分区中读取内容?
comapany_df = sparkSession
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", applicationProperties.getProperty(BOOTSTRAP_SERVERS_CONFIG))
.option("subscribe", topicName)
我正在使用类似上面的内容。如何指定要读取的特定分区?
答案 0 :(得分:0)
您可以使用以下代码块从特定的Kafka分区读取。
public void processKafka() throws InterruptedException {
LOG.info("************ SparkStreamingKafka.processKafka start");
// Create the spark application
SparkConf sparkConf = new SparkConf();
sparkConf.set("spark.executor.cores", "5");
//To express any Spark Streaming computation, a StreamingContext object needs to be created.
//This object serves as the main entry point for all Spark Streaming functionality.
//This creates the spark streaming context with a 'numSeconds' second batch size
jssc = new JavaStreamingContext(sparkConf, Durations.seconds(sparkBatchInterval));
//List of parameters
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", this.getBrokerList());
kafkaParams.put("client.id", "SpliceSpark");
kafkaParams.put("group.id", "mynewgroup");
kafkaParams.put("auto.offset.reset", "earliest");
kafkaParams.put("enable.auto.commit", false);
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
List<TopicPartition> topicPartitions= new ArrayList<TopicPartition>();
for(int i=0; i<5; i++) {
topicPartitions.add(new TopicPartition("mytopic", i));
}
//List of kafka topics to process
Collection<String> topics = Arrays.asList(this.getTopicList().split(","));
JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
);
//Another version of an attempt
/*
JavaInputDStream<ConsumerRecord<String, String>> messages = KafkaUtils.createDirectStream(
jssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Assign(topicPartitions, kafkaParams)
);
*/
messages.foreachRDD(new PrintRDDDetails());
// Start running the job to receive and transform the data
jssc.start();
//Allows the current thread to wait for the termination of the context by stop() or by an exception
jssc.awaitTermination();
}