Question

我正在尝试从startTime到endTime读取Kafka主题，可以在此时间间隔之外阅读更多消息，但我想在该时间间隔内处理所有消息。我检查了Simple Consumer并找到了getOffsetBefore（），它会在我的startTime之前给出偏移量。但我不确定如何在endTime之后获得每个分区的偏移量。请帮忙！

Answer 1

低于kafka消费者api自0.10.1 ver

起可用

/**
 * Look up the offsets for the given partitions by timestamp. The returned offset for each partition is the
 * earliest offset whose timestamp is greater than or equal to the given timestamp in the corresponding partition.
 *
 * This is a blocking call. The consumer does not have to be assigned the partitions.
 * If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps, null
 * will be returned for that partition.
 *
 * Notice that this method may block indefinitely if the partition does not exist.
 *
 * @param timestampsToSearch the mapping from partition to the timestamp to look up.
 * @return a mapping from partition to the timestamp and offset of the first message with timestamp greater
 *         than or equal to the target timestamp. {@code null} will be returned for the partition if there is no
 *         such message.
 * @throws IllegalArgumentException if the target timestamp is negative.
 */
@Override
public Map<TopicPartition, OffsetAndTimestamp> offsetsForTimes(Map<TopicPartition, Long> timestampsToSearch) {
    for (Map.Entry<TopicPartition, Long> entry : timestampsToSearch.entrySet()) {
        // we explicitly exclude the earliest and latest offset here so the timestamp in the returned
        // OffsetAndTimestamp is always positive.
        if (entry.getValue() < 0)
            throw new IllegalArgumentException("The target time for partition " + entry.getKey() + " is " +
                    entry.getValue() + ". The target time cannot be negative.");
    }
    return fetcher.getOffsetsByTimes(timestampsToSearch, requestTimeoutMs);
}

Answer 2

无法保证结束时间，因为没有人能预见未来。

假设您知道起始偏移并读取主题末尾的所有数据。可能仍然有一个制作人，用一个属于你的时间戳写一个记录......

注意，Kafka的记录时间戳是元数据，因此，任何记录都可以有任何时间戳。经纪人不以任何方式解释此时间戳（仅限Streams API）。因此，Kafka经纪人只保证基于偏移的消息排序，而不是基于时间戳的排序。如果记录没有按时间排序，即具有较大偏移量的记录的时间戳小于具有较小偏移量的记录 - 该记录是所谓的“后期记录”（关于时间），并且存在没有迟到的上限。

您只能在业务逻辑中决定您想要阅读的范围。因此，给定起始偏移量，您只需消费者消息并同时监视时间戳。当你看到时间戳大于间隔的第一条记录时，你可以停止处理 - 这将是最严格的处理，它不允许任何迟到的记录。你“错过”一些数据的可能性相对较高。

或者您应用限制较少的上限，并阅读，直到您看到时间戳大于interval upper bound + X且X为您选择的配置参数的记录。较大的X越小，您错过任何记录的概率就越小。

在时间戳

2 个答案: