Question

我已经通过使用.NET Core 2.2上的BackgroundService将kafka使用者实现为控制台应用程序。我正在使用confluent-kafka-dotnet v 1.0.1.1作为Apache Kafka的客户端。我对如何处理每条消息感到怀疑。

1）由于处理每条消息可能要花费一些时间（最多24小时），因此我为每条消息启动了一个新的任务，这样我就不会阻止消费者使用新消息。我认为如果消息太多，那么每次创建一个新的Task都不是正确的方法。那么处理每条消息的正确方法是什么？是否可以为每条消息创建一种动态后台服务？

2）如果已经处理了一条消息，但是应用程序崩溃或发生了重新平衡，那么我最终会多次使用和处理同一条消息。我是否应该自动提交offset（或在消耗完之后立即提交）并将消息（或任务）的状态存储在某个地方（例如数据库）？

我知道这里有Hangfire，但是我不确定是否需要使用它。如果我当前的方法完全错误，请给我一些建议。任何建议/帮助将不胜感激。

这是ConsumerService的实现：

public class ConsumerService : BackgroundService
{
    private readonly IConfiguration _config;
    private readonly IElasticLogger _logger;
    private readonly ConsumerConfig _consumerConfig;
    private readonly string[] _topics;
    private readonly double _maxNumAttempts;
    private readonly double _retryIntervalInSec;

    public ConsumerService(IConfiguration config, IElasticLogger logger)
    {
        _config = config;
        _logger = logger;
        _consumerConfig = new ConsumerConfig
        {
            BootstrapServers = _config.GetValue<string>("Kafka:BootstrapServers"),
            GroupId = _config.GetValue<string>("Kafka:GroupId"),
            EnableAutoCommit = _config.GetValue<bool>("Kafka:Consumer:EnableAutoCommit"),
            AutoOffsetReset = (AutoOffsetReset)_config.GetValue<int>("Kafka:Consumer:AutoOffsetReset")
        };
        _topics = _config.GetValue<string>("Kafka:Consumer:Topics").Split(',');
        _maxNumAttempts = _config.GetValue<double>("App:MaxNumAttempts");
        _retryIntervalInSec = _config.GetValue<double>("App:RetryIntervalInSec");
    }

    protected override Task ExecuteAsync(CancellationToken stoppingToken)
    {
        Console.WriteLine("!!! CONSUMER STARTED !!!\n");

        // Starting a new Task here because Consume() method is synchronous
        var task = Task.Run(() => ProcessQueue(stoppingToken), stoppingToken);

        return task;
    }

    private void ProcessQueue(CancellationToken stoppingToken)
    {
        using (var consumer = new ConsumerBuilder<Ignore, Request>(_consumerConfig).SetValueDeserializer(new MessageDeserializer()).Build())
        {
            consumer.Subscribe(_topics);

            try
            {
                while (!stoppingToken.IsCancellationRequested)
                {
                    try
                    {
                        var consumeResult = consumer.Consume(stoppingToken);

                        // Don't want to block consume loop, so starting new Task for each message  
                        Task.Run(async () =>
                        {
                            var currentNumAttempts = 0;
                            var committed = false;

                            var response = new Response();

                            while (currentNumAttempts < _maxNumAttempts)
                            {
                                currentNumAttempts++;

                                // SendDataAsync is a method that sends http request to some end-points
                                response = await Helper.SendDataAsync(consumeResult.Value, _config, _logger);

                                if (response != null && response.Code >= 0)
                                {
                                    try
                                    {
                                        consumer.Commit(consumeResult);
                                        committed = true;

                                        break;
                                    }
                                    catch (KafkaException ex)
                                    {
                                        // log
                                    }
                                }
                                else
                                {
                                    // log
                                }

                                if (currentNumAttempts < _maxNumAttempts)
                                {
                                    // Delay between tries
                                    await Task.Delay(TimeSpan.FromSeconds(_retryIntervalInSec));
                                }
                            }

                            if (!committed)
                            {
                                try
                                {
                                    consumer.Commit(consumeResult);
                                }
                                catch (KafkaException ex)
                                {
                                    // log
                                }
                            }
                        }, stoppingToken);
                    }
                    catch (ConsumeException ex)
                    {
                        // log
                    }
                }
            }
            catch (OperationCanceledException ex)
            {
                // log
                consumer.Close();
            }
        }
    }
}

Answer 1

同意 Fabio 的观点，您不应该为了处理消息而 Task.Run，因为您最终会导致大量线程浪费资源并切换它们的执行，从而影响性能。

此外，在同一个线程中处理消费的消息是可以的，因为 Kafka 使用拉模型，您的应用程序可以按照自己的节奏处理消息。

关于不止一次处理消息，我建议存储已处理消息的偏移量，以便跳过已处理的消息。由于 offset 是一个长基数，因此您可以轻松跳过偏移量小于之前提交的消息。当然，这只有在你有一个分区时才有效，因为 Kafka 提供了分区级别的偏移计数器和顺序保证

您可以在 my article 中找到 Kafka Consumer 的示例。如果您有任何问题，请随时提问，我很乐意为您提供帮助

如何在.NET Core上正确实现kafka使用者作为后台服务

1 个答案: