Question

目前我有这样的代码：

KafkaTemplate<String, String> kafkaTemplate;

List<Pet> myData;

for(Pet p: myData) {
  String json = objectWriter.writeValueAsString(p)
  kafkaTemplate.send(topic, json)
}

因此，每个列表项都是一一发送的。我如何一次发送整个列表？

Answer 1

因此，没有直接方法使用KafkaTemplate或KafkaProducer将批量消息直接发送到kafka。他们没有接受List对象并将它们分别发送到不同分区的任何方法。

kafka生产者如何向kafka发送消息？

KafkaProducer

Kafka生产者创建一批记录，然后一次发送所有这些记录，以获取更多information

生产者由一个缓冲空间池和一个后台I / O线程组成，该缓冲池保存尚未传输到服务器的记录，该I / O线程负责将这些记录转换为请求并将它们传输到集群。

send（）方法是异步的。调用时，它将记录添加到暂挂记录发送并立即返回的缓冲区中。这使生产者可以将单个记录分批处理以提高效率。

Asynchronous send

批处理是提高效率的主要驱动力之一，而要启用批处理，Kafka生产者将尝试在内存中累积数据并在单个请求中发送更大的批处理。批处理可以配置为累积不超过固定数量的消息，并且等待不超过某个固定等待时间限制（例如64k或10 ms）。这样可以积累更多的字节来发送，并且在服务器上进行一些较小的I / O操作。这种缓冲是可配置的，并提供了一种机制，可以折衷少量额外的延迟以提高吞吐量。

由于您使用的是spring-kafka，因此可以发送List<Objects>，但是此处您是将JSONArray中的JSONObject而不是每个JSONObject发送到主题分区

public KafkaTemplate<String, List<Object>> createTemplate() {

        Map<String, Object> senderProps = senderProps();
ProducerFactory<Integer, String> pf =
          new DefaultKafkaProducerFactory<String, List<Object>>(senderProps);
        KafkaTemplate<String, List<Object>> template = new KafkaTemplate<>(pf);
return template;

 }

 public Map<String, Object> producerProps() {

        Map<String, Object> props = new HashMap<>();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(ProducerConfig.RETRIES_CONFIG, 0);
        props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
        props.put(ProducerConfig.LINGER_MS_CONFIG, 1);
        props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432);
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
       props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
       return props;

 }

KafkaTemplate<String, List<Object>> kafkaTemplate;

Answer 2

通常设置属性就足够了：

 props.put(ProducerConfig.LINGER_MS_CONFIG, 10);

并使用以下属性增加批处理缓冲区的大小：

props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);

根据您的要求。

注意：确保代码中没有flush()方法调用，因为它会丢弃所有批处理设置。

Answer 3

使用Deadpool已经回答过的KafkaTemplate，还有另一种方法将对象化为字节数组并发送整个对象。

这不是最佳方法，因为它否决了Kafka的最佳做法，即尽可能多地分发和并行化。因此，通常分配消息，并让生产者使用池缓冲区和分区进行并行化。但是有时我们可能需要使用特定的用法........

///您可以使用任何对象列表，而我只是在使用String，但是可以将其增强为任何对象，毕竟所有的String都是对象。如果使用Pojo，可以将其转换为JSON字符串并作为String列表传递JSON。

public byte[] searlizedByteArray(List<String> listObject) throws IOException {

            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutput out = null;
            byte[] inByteArray = null;
            try {
                out = new ObjectOutputStream(bos);
                out.writeObject(listObject);
                out.flush();
                inByteArray = bos.toByteArray();
            } finally {
                if (bos != null)
                    bos.close();

            }
            return inByteArray;
        }

将byte []数组反密封为对象列表

public List<String> desearlizedByteArray(byte[] byteArray) throws IOException, ClassNotFoundException {
    ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
    ObjectInput in = null;
    List<String> returnList=null;
    try {
      in = new ObjectInputStream(bis);
      List<String> o = (List<String>) in.readObject(); 

     for (String string : o) {
        System.out.println("==="+o);
    }

    } finally {
      try {
        if (in != null) {
          in.close();
        }
      } catch (IOException ex) {
        // ignore close exception
      }
    }
    return returnList;

}

请注意，我们使用VALUE_SERIALIZER_CLASS_CONFIG作为ByteArraySerializer

public void publishMessage() throws Exception {
        Properties properties = new Properties();
        properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:8080");
        properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class.getName());
//You might to increase buffer memory and request size in case of large size,
properties.put(ProducerConfig.BUFFER_MEMORY_CONFIG, "");
        properties.put(ProducerConfig.MAX_REQUEST_SIZE_CONFIG, "");

        Producer<String, byte[]> producer = new org.apache.kafka.clients.producer.KafkaProducer<String, byte[]>(
                properties);
        try {
            List asl=new ArrayList<String>();
            asl.add("test1");
            asl.add("test2");

            byte[] byteArray = searlizedByteArray(asl);
            ProducerRecord producerRecord = new ProducerRecord<String, byte[]>("testlist", null,
                    byteArray);

            producer.send(producerRecord).get();

        } finally {
            if (producer != null) {
                producer.flush();
                producer.close();
            }

        }

    }

最后在消费者ByteArrayDeserializer中可以使用

public void consumeMessage() {

    Properties properties = new Properties();

    properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9091");
    properties.setProperty("key.deserializer", StringDeserializer.class.getName());
    properties.setProperty("value.deserializer", ByteArrayDeserializer.class.getName());
    properties.setProperty("group.id", "grp_consumer");
    properties.put("auto.offset.reset", "earliest");


    KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<String, byte[]>(properties);
    consumer.subscribe(Arrays.asList("testlist"));

    while (true) {
        ConsumerRecords<String, byte[]> records = consumer.poll(100);
        for (ConsumerRecord<String, byte[]> record : records) {


        }
    }



}

如何使用Spring Kafka生产者发送批量数据

3 个答案: