春季Kafka生产者线程不断增加

时间:2020-05-12 16:13:17

标签: java tomcat apache-kafka spring-kafka

我们在Spring上使用Kafka,目前正在对该应用程序进行一些负载测试。在启动负载测试后的几分钟内,Tomcat停止响应,在分析线程转储时,我看到大量的Kafka生产者线程,并假设这可能是应用程序挂起的原因。线程数量很高,即几分钟内有200多个Kafka生产者线程。有什么办法可以关闭这些生产者线程。下面给出的是我的Spring Kafka生产者配置。

编辑: 在我们的应用程序中,我们有一个事件pub / sub,我正在使用Kafka来发布事件。 分区数:15,并发:5

@Bean
public ProducerFactory<String, Object> producerFactory() {
    Map<String, Object> configProps = new HashMap<>();
    configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
    configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
    configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
    configProps.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, KafkaCustomPartitioner.class);
    configProps.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);
    configProps.put(ProducerConfig.LINGER_MS_CONFIG, 200);


    DefaultKafkaProducerFactory factory = new DefaultKafkaProducerFactory<>(configProps);
    factory.setTransactionIdPrefix(serverId+"-tx-");
    // factory.setProducerPerConsumerPartition(false);
    return factory;
}

public ConsumerFactory<String, Object> consumerFactory(String groupId) {
    Map<String, Object> props = new HashMap<>();
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
    props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class);
    props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
    props.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG,"read_committed");
    props.put(ConsumerConfig.GROUP_ID_CONFIG,"custom-group-id");
    props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,60000);
    props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG,5000);
    props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,20);
    props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,600000);
    props.put(JsonDeserializer.TRUSTED_PACKAGES, "org.xxx.xxx.xxx");
    return new DefaultKafkaConsumerFactory<>(props);
}


@Bean
public ConcurrentKafkaListenerContainerFactory<String, String> customKafkaListenerContainerFactory() {
    ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
    //factory.setConcurrency(eventTopicConcurrency);
    factory.getContainerProperties().setAckOnError(false);
    factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
    factory.setErrorHandler(new SeekToCurrentErrorHandler());
    factory.setConsumerFactory(consumerFactory("custom-group-id"));

    return factory;
}

下面是我的发布者和订阅者代码

@Override
public void publish(Event event) {
    //try {
        DomainEvent event = event.getDomainEvent();
        ListenableFuture<SendResult<String, Object>> future = kafkaTemplate.send(topicName,
                event.getMainDocumentId() != null ? event.getMainDocumentId() : null, event);

        future.addCallback(new ListenableFutureCallback<SendResult<String, Object>>() {

            @Override
            public void onSuccess(SendResult<String, Object> result) {
                if(LOGGER.isDebugEnabled())
                    LOGGER.debug("Published event {} : {}",event.getEventName(), event.getEventId());
            }

            @Override
            public void onFailure(Throwable ex) {
                LOGGER.error("Failed to publish event {} : {} ", event.getEventName(), event.getEventId());
                throw new RuntimeException(ex);
            }
        });
    }

侦听器:一个事件有多个订阅者,因此,当我们从Kafka接收到一个事件时,我们为每个订阅者生成新线程以处理该事件,并且当所有订阅者都完成处理后,我们提交偏移量。

@KafkaListener(topics = "${kafka.event.topic.name}-#{ClusterConfigSplitter.toClusterId('${cluster.info}')}", concurrency="${kafka.event.topic.concurrency}", clientIdPrefix="${web.server.id}-event-consumer", containerFactory = "customKafkaListenerContainerFactory")
public void eventTopicListener(Event event, Acknowledgment ack)
        throws InterruptedException, ClassNotFoundException, IOException {

    if(LOGGER.isDebugEnabled())
        LOGGER.debug("Received event {} : {}", event.getDomainEvent().getEventName(), event.getDomainEvent().getEventId());

    DomainEvent domainEvent = event.getDomainEvent();

    List<EventSubscriber> subcribers = new ArrayList<>();
    for (String failedSubscriber : event.getSubscribersToRetry()) {
        subcribers.add(eventSubcribers.get(failedSubscriber));
    }

    CountDownLatch connectionLatch = new CountDownLatch(subcribers.size());

    List<String> failedSubscribers = new ArrayList<>();

    for (EventSubscriber subscriber : subcribers) {

        taskExecutor.execute(new Runnable() {
            @Override
            public void run() {
                tenantContext.setTenant(domainEvent.getTenantId());
                DefaultTransactionDefinition def = new DefaultTransactionDefinition();
                def.setName(domainEvent.getEventId() + "-" + subscriber.getClass().getName());
                def.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRES_NEW);

                TransactionStatus status = txManager.getTransaction(def);

                try {
                    subscriber.handle(domainEvent);
                    txManager.commit(status);
                } catch (Exception ex) {
                    LOGGER.error("Processing event {} : {} failed for {} - {}", domainEvent.getEventName(), domainEvent.getEventId(), ex);

                    txManager.rollback(status);
                    failedSubscribers.add(subscriber.getClass().getName());
                }

                connectionLatch.countDown();

                if(LOGGER.isDebugEnabled())
                    LOGGER.debug("Processed event {} : {} by {} ", domainEvent.getEventName(), domainEvent.getEventId(), subscriber.getClass().getName());
            }
        });

    }

    connectionLatch.await();

    ack.acknowledge();

    if(failedSubscribers.size()>0) {

        eventPersistenceService.eventFailed(domainEvent, failedSubscribers, event.getRetryCount()+1);

    }




}

TransactionManager

    @Bean
@Primary
public PlatformTransactionManager transactionManager(EntityManagerFactory factory,@Qualifier("common-factory") EntityManagerFactory commonFactory, ProducerFactory producerFactory){

    JpaTransactionManager transactionManager = new JpaTransactionManager();
    transactionManager.setEntityManagerFactory(factory);

    JpaTransactionManager commonTransactionManager = new JpaTransactionManager();
    commonTransactionManager.setEntityManagerFactory(commonFactory);

    KafkaTransactionManager kafkaTransactionManager= new KafkaTransactionManager(producerFactory);

    return new ChainedKafkaTransactionManager(kafkaTransactionManager,commonTransactionManager,transactionManager);

}

1 个答案:

答案 0 :(得分:1)

我将写一个更完整的答案,以帮助其他可能发现此问题的人。

默认情况下,使用事务时,我们必须为每个group/topic/partition组合创建一个新的生产者(假定事务由使用者线程启动);这样一来,如果发生再平衡,生产者就可以被适当地围起来。

2.5个kafka客户端具有改进的算法,可以改善这种情况,我们不再需要所有这些生产者。

但是,为了使用此功能,必须将代理升级到2.5.0。

即将发布的2.5.0.RELEASE(将于明天到期)允许将这种新的线程模型用于事务生产者。

release candidate可以进行测试。

有关新功能的文档为here

但是,您已禁止创建提供正确的生产者防护的生产者。

factory.setProducerPerConsumerPartition(false);

因此,在这种情况下,您应该看到生产者已被缓存;除非您在侦听器容器上具有大量并发并且以很高的产量进行生产,否则会有这么多的生产者会是不寻常的。

生产者工厂当前不支持限制缓存的大小。

也许您可以编辑问题以进一步解释您的应用程序在做什么,并显示更多代码/配置。