Question

为了使用带有 Kafka binder 的 Spring Cloud Stream 3.1.1 管理长时间运行的任务，我们需要使用 Pollable Consumer 在单独的线程中手动管理消费，以便 Kafka 不会触发重新平衡。为此，我们定义了一个新的注解来管理 Pollable Consumer。这种方法的问题是因为需要在单独的线程中管理工作，抛出的任何异常最终都不会在 errorChannel 和 DLQ 中结束。

  private final ExecutorService executor = Executors.newFixedThreadPool(1);

  private volatile boolean paused = false;

  @Around(value = "@annotation(pollableConsumer) && args(dataCapsule,..)")
  public void handleMessage(ProceedingJoinPoint joinPoint,
      PollableConsumer pollableConsumer, Object dataCapsule) {
    if (dataCapsule instanceof Message) {
      Message<?> message = (Message<?>) dataCapsule;
      AcknowledgmentCallback callback = StaticMessageHeaderAccessor
          .getAcknowledgmentCallback(message);
      callback.noAutoAck();

      if (!paused) {
        // The separate thread is not busy with a previous message, so process this message:
        Runnable runnable = () -> {
          try {
            paused = true;

            // Call method to process this Kafka message
            joinPoint.proceed();

            callback.acknowledge(Status.ACCEPT);
          } catch (Throwable e) {
            callback.acknowledge(Status.REJECT);
            throw new PollableConsumerException(e);
          } finally {
            paused = false;
          }
        };

        executor.submit(runnable);
      } else {  

        // The separate thread is busy with a previous message, so re-queue this message for later:
        callback.acknowledge(Status.REQUEUE);
      }
    }
  }

我们可以创建一个不同的输出通道来在异常情况下发布消息，但感觉我们正在尝试实现一些可能没有必要的东西。

更新 1

我们添加了这些 bean：

  @Bean
  public KafkaTemplate<String, byte[]> kafkaTemplate() {
    return new KafkaTemplate<>(producerFactory());
  }
  @Bean
  public ProducerFactory<String, byte[]> producerFactory() {
    Map<String, Object> configProps = new HashMap<>();
    configProps.put(
        org.apache.kafka.clients.producer.ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
        "http://localhost:9092");
    configProps.put(
        org.apache.kafka.clients.producer.ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
        StringSerializer.class);
    configProps.put(
        org.apache.kafka.clients.producer.ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
        KafkaAvroSerializer.class);
    return new DefaultKafkaProducerFactory<>(configProps);
  }
  @Bean
  public KafkaAdmin admin() {
    Map<String, Object> configs = new HashMap<>();
    configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "http://localhost:9092");
    return new KafkaAdmin(configs);
  }
  @Bean
  public NewTopic topicErr() {
    return TopicBuilder.name("ERR").partitions(1).replicas(1).build();
  }
  @Bean
  public SeekToCurrentErrorHandler eh(KafkaOperations<String, byte[]> template) {
    return new SeekToCurrentErrorHandler(new DeadLetterPublishingRecoverer(
        template,
        (cr, e) -> new TopicPartition("ERR", 1)),
        new FixedBackOff(0L, 1L));
  }

并且 enable-dlq 未在 spring.cloud.stream.kafka.bindings.channel-name.consumer 中设置但是我们仍然看不到向 ERR 主题生成的任何消息。即使是主线程抛出的任何异常。

如果 enable-dlq 设置为 true，则主线程上的异常将发布到默认的 dlq 主题中，并且如预期的那样，子线程上的异常将被忽略。

更新 2

示例 Gary 似乎一般都在工作。虽然我们需要做一些修改，因为我们使用已弃用的 StreamListner 方法而不是 Functions，但我们无法解决一些问题。

主题名称似乎总是channel_name+.DLT，因为我们无法弄清楚如何使用不同的名称，如 dlq。我们为所有消费者使用单个 dlq 主题，这似乎不是 Spring-kafka 默认 DLT 期望的方式。
似乎我们需要在 DLT 上至少拥有与消费者主题相同数量的分区。否则，此解决方案不起作用。不确定如何进行管理，因为这对我们来说似乎不是一个实际的假设。
有没有一种方法可以利用 Spring 重试，类似于 Spring Cloud Stream 在幕后所做的事情？或者这需要单独实现？即基于 max.attempts 重试工作，然后启动 DLQ 部分。
我可以看到，在示例中，Spring 执行器已用于通过 this.endpoint.changeState("polled", State.PAUSED) 和 this.endpoint.changeState("polled", State.RESUMED) 更新通道状态。为什么我们需要与暂停、重新排队等一起做。不这样做有什么副作用？

Answer 1

你的观察是正确的；错误处理绑定到线程。

您可以直接在代码中使用 DeadLetterPublishingRecoverer 来更轻松地发布 DLQ（而不是输出通道）。这样，您将获得带有异常信息等的增强标头。

https://docs.spring.io/spring-kafka/docs/current/reference/html/#dead-letters

编辑

这是一个例子；我正在暂停绑定以防止在运行“作业”时出现任何新的交付，而不是像您所做的那样重新排队交付。

@SpringBootApplication
@EnableScheduling
public class So67296258Application {

    public static void main(String[] args) {
        SpringApplication.run(So67296258Application.class, args);
    }

    @Bean
    TaskExecutor exec() {
        return new ThreadPoolTaskExecutor();
    }

    @Bean
    DeadLetterPublishingRecoverer recoverer(KafkaOperations<Object, Object> template) {
        return new DeadLetterPublishingRecoverer(template);
    }

    @Bean
    NewTopic topic() {
        return TopicBuilder.name("polled.DLT").partitions(1).replicas(1).build();
    }

    @Bean
    MessageSourceCustomizer<KafkaMessageSource<?, ?>> customizer() {
        return (source, dest, group) -> source.setRawMessageHeader(true);
    }

}

@Component
class Handler {

    private static final Logger LOG = LoggerFactory.getLogger(Handler.class);

    private final PollableMessageSource source;

    private final TaskExecutor exec;

    private final BindingsEndpoint endpoint;

    private final DeadLetterPublishingRecoverer recoverer;

    Handler(PollableMessageSource source, TaskExecutor exec, BindingsEndpoint endpoint,
            DeadLetterPublishingRecoverer recoverer) {

        this.source = source;
        this.exec = exec;
        this.endpoint = endpoint;
        this.recoverer = recoverer;
    }

    @Scheduled(fixedDelay = 5_000)
    public void process() {
        LOG.info("Polling");
        boolean polled = this.source.poll(msg -> {
            LOG.info("Pausing Binding");
            this.endpoint.changeState("polled", State.PAUSED);
            AcknowledgmentCallback callback = StaticMessageHeaderAccessor.getAcknowledgmentCallback(msg);
            callback.noAutoAck();
//          LOG.info(msg.toString());
            this.exec.execute(() -> {
                try {
                    runJob(msg);
                }
                catch (Exception e) {
                    this.recoverer.accept(msg.getHeaders().get(KafkaHeaders.RAW_DATA, ConsumerRecord.class), e);
                }
                finally {
                    callback.acknowledge();
                    this.endpoint.changeState("polled", State.RESUMED);
                    LOG.info("Resumed Binding");
                }
            });
        });
        LOG.info("" + polled);
    }

    private void runJob(Message<?> msg) throws InterruptedException {
        LOG.info("Running job");
        Thread.sleep(30_000);
        throw new RuntimeException("fail");
    }

}

spring.cloud.stream.pollable-source=polled
spring.cloud.stream.bindings.polled-in-0.destination=polled
spring.cloud.stream.bindings.polled-in-0.group=polled

EDIT2

补充问题的答案：

1、2：请参阅 Spring for Apache Kafka 文档：https://docs.spring.io/spring-kafka/docs/current/reference/html/#dead-letters

DLPR 有一个备用构造函数，使您能够指定目标解析器。默认仅附加 .DLT 并使用相同的分区。 javadocs 指定如何指定目标分区：

    /**
     * Create an instance with the provided template and destination resolving function,
     * that receives the failed consumer record and the exception and returns a
     * {@link TopicPartition}. If the partition in the {@link TopicPartition} is less than
     * 0, no partition is set when publishing to the topic.
     * @param template the {@link KafkaOperations} to use for publishing.
     * @param destinationResolver the resolving function.
     */

当 null 时，KafkaProducer 选择分区。

使用适当的重试和退避策略连接 RetryTemplate；然后

retryTemplate.execute(context -> { ... },
    context -> {...});

第二个参数是一个 RecoveryCallback，当重试用完时调用。

效率更高。使用您的解决方案，您可以在处理之前的任务时不断检索和重新排队交货。通过暂停绑定，我们告诉 kafka 在我们poll() 时不再发送任何记录，直到我们恢复消费者。这允许我们通过轮询消费者来保持它的活动，但没有检索和重置偏移量的开销。

如果使用不同的线程，Spring Cloud Stream 可轮询使用者 dlq 和 errorChannel 将不起作用

1 个答案: