Question

我正在尝试使用kafka-node来读取来自kafka主题的压缩消息。

问题在于，最近插入的消息保留在EOL上方，直到插入其他消息才可访问。实际上，EOL和“高水位偏移”之间存在一定的距离，这会阻止阅读最新消息。尚不清楚这是为什么。

已使用以下主题创建了主题

kafka-topics.sh --zookeeper ${KAFKA_HOST}:2181 --create --topic atopic --config "cleanup.policy=compact" --config "delete.retention.ms=100" --config "segment.ms=100" --config "min.cleanable.dirty.ratio=0" --partitions 1 --replication-factor 1

该主题中产生了许多键值。有些键是相同的。

var client = new kafka.KafkaClient({kafkaHost: "<host:port>",autoConnect: true})
var producer = new HighLevelProducer(client);
  producer.send(payload, function(error, result) {
  debug('Sent payload to Kafka: ', payload);
  if (error) {
    console.error(error);
  } else {
   res(true)
  }
  client.close()
 });
});

这里是插入的键和值

key - 1
key2 - 1
key3 - 1
key - 2
key2 - 2
key3 - 2
key1 - 3
key - 3
key2 - 3
key3 - 3

然后请求了一组主题键。

var options = {
        id: 'consumer1',
        kafkaHost: "<host:port>",
        groupId: "consumergroup1",
        sessionTimeout: 15000,
        protocol: ['roundrobin'],
        fromOffset: 'earliest'
      };
      var consumerGroup = new ConsumerGroup(options, topic);
        consumerGroup.on('error', onError);
        consumerGroup.on('message', onMessage);
        consumerGroup.on('done', function(message) {
          consumerGroup.close(true,function(){ });
        })
        function onError (error) {
          console.error(error);
        }
        function onMessage (message) {)
            console.log('%s read msg Topic="%s" Partition=%s Offset=%d HW=%d', this.client.clientId, message.topic, message.partition, message.offset, message.highWaterOffset, message.value);
        }
      })

结果令人惊讶：

consumer1 read msg Topic="atopic" Partition=0 Offset=4 highWaterOffset=10 Key=key2 value={"name":"key2","url":"2"}
consumer1 read msg Topic="atopic" Partition=0 Offset=5 highWaterOffset=10 Key=key3 value={"name":"key3","url":"2"}
consumer1 read msg Topic="atopic" Partition=0 Offset=6 highWaterOffset=10 Key=key1 value={"name":"key1","url":"3"}
consumer1 read msg Topic="atopic" Partition=0 Offset=7 highWaterOffset=10 Key=key value={"name":"key","url":"3"}
consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value=
consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value=
consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value=
consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value=

水偏移量很高，代表最新值10。但是，消费者看到的偏移值仅为7。某种程度上，压缩使消费者无法看到最新消息。

目前尚不清楚如何避免这种限制并允许消费者查看最新消息。

任何建议表示赞赏。谢谢。

Answer 1

某种程度上，压缩使消费者无法看到最新消息。

是的，您丢失了一些消息，但同时也看到了其他消息。

Compact正在删除早期的键。

请注意，根本没有url - 1值

Key=key2 value={"name":"key2","url":"2"}
Key=key3 value={"name":"key3","url":"2"}
Key=key1 value={"name":"key1","url":"3"}
Key=key value={"name":"key","url":"3"}

那是因为您为同一键发送了新值。

您发送了10条消息，因此该主题的最高水位偏移为10

您的代码不一定看起来不对，但是您应该再添加两个3值。打印的偏移量与此逻辑相对应。

key  - 1 | 0
key2 - 1 | 1
key3 - 1 | 2
key  - 2 | 3
key2 - 2 | 4
key3 - 2 | 5
key1 - 3 | 6
key  - 3 | 7
key2 - 3 | 8
key3 - 3 | 9

通常，我建议不要让Kafka尝试压缩主题并每秒写10次日志段，以及使用诸如node-rdkafka之类的不同库

Answer 2

在使用kafka进行了更多工作之后，kafka-node api似乎具有以下行为（我认为这实际上源自kafka本身）。

当在highWaterOff之前查询消息时，只有直到highWaterOffset的消息才返回到ConsumerGroup。如果消息没有被复制，这是有意义的，因为组中的另一个使用者不一定会看到这些消息。

仍然可以使用Consumer而不是ConsumerGroup并通过查询特定分区来请求和接收超出highWaterOffset的消息。

此外，当偏移量不一定位于lastOffset时，“ done”事件似乎也会触发。在这种情况下，有必要在message.offset + 1处提交进一步的查询。如果继续执行此操作，则可以使所有消息保持最新状态。

我不清楚为什么kafka会发生这种行为，但是可能存在一些较低级别的细节，这些细节浮出水面。

Kafka Node-如何检索紧凑主题中的所有消息

2 个答案: