在时间戳

时间:2017-04-10 15:09:23

标签: timestamp apache-kafka offset kafka-consumer-api

我正在尝试从startTime到endTime读取Kafka主题,可以在此时间间隔之外阅读更多消息,但我想在该时间间隔内处理所有消息。我检查了Simple Consumer并找到了getOffsetBefore(),它会在我的startTime之前给出偏移量。但我不确定如何在endTime之后获得每个分区的偏移量。请帮忙!

2 个答案:

答案 0 :(得分:1)

低于kafka消费者api自0.10.1 ver

起可用
/**
 * Look up the offsets for the given partitions by timestamp. The returned offset for each partition is the
 * earliest offset whose timestamp is greater than or equal to the given timestamp in the corresponding partition.
 *
 * This is a blocking call. The consumer does not have to be assigned the partitions.
 * If the message format version in a partition is before 0.10.0, i.e. the messages do not have timestamps, null
 * will be returned for that partition.
 *
 * Notice that this method may block indefinitely if the partition does not exist.
 *
 * @param timestampsToSearch the mapping from partition to the timestamp to look up.
 * @return a mapping from partition to the timestamp and offset of the first message with timestamp greater
 *         than or equal to the target timestamp. {@code null} will be returned for the partition if there is no
 *         such message.
 * @throws IllegalArgumentException if the target timestamp is negative.
 */
@Override
public Map<TopicPartition, OffsetAndTimestamp> offsetsForTimes(Map<TopicPartition, Long> timestampsToSearch) {
    for (Map.Entry<TopicPartition, Long> entry : timestampsToSearch.entrySet()) {
        // we explicitly exclude the earliest and latest offset here so the timestamp in the returned
        // OffsetAndTimestamp is always positive.
        if (entry.getValue() < 0)
            throw new IllegalArgumentException("The target time for partition " + entry.getKey() + " is " +
                    entry.getValue() + ". The target time cannot be negative.");
    }
    return fetcher.getOffsetsByTimes(timestampsToSearch, requestTimeoutMs);
}

答案 1 :(得分:0)

无法保证结束时间,因为没有人能预见未来。

假设您知道起始偏移并读取主题末尾的所有数据。可能仍然有一个制作人,用一个属于你的时间戳写一个记录......

注意,Kafka的记录时间戳是元数据,因此,任何记录都可以有任何时间戳。经纪人不以任何方式解释此时间戳(仅限Streams API)。因此,Kafka经纪人只保证基于偏移的消息排序,而不是基于时间戳的排序。如果记录没有按时间排序,即具有较大偏移量的记录的时间戳小于具有较小偏移量的记录 - 该记录是所谓的“后期记录”(关于时间),并且存在没有迟到的上限。

您只能在业务逻辑中决定您想要阅读的范围。因此,给定起始偏移量,您只需消费者消息并同时监视时间戳。当你看到时间戳大于间隔的第一条记录时,你可以停止处理 - 这将是最严格的处理,它不允许任何迟到的记录。你“错过”一些数据的可能性相对较高。

或者您应用限制较少的上限,并阅读,直到您看到时间戳大于interval upper bound + XX为您选择的配置参数的记录。较大的X越小,您错过任何记录的概率就越小。