如何在Apache Beam中分配和检查消息的事件时间

时间:2018-10-09 16:05:39

标签: apache-beam

我的来源是KafkaIO.read(),现在我想使用ParDo来解码来自kafka的消息,并使用消息的一个字段作为该消息的事件时间。我该怎么做?我没有找到有关此操作的示例。

1 个答案:

答案 0 :(得分:0)

首先,您需要通过扩展TimestampPolicy<KeyT,ValueT>

来实现CustomTimestampPolicy

例如:

public class CustomFieldTimePolicy extends TimestampPolicy<String, Foo> {


protected Instant currentWatermark;

public CustomFieldTimePolicy(Optional<Instant> previousWatermark) {
    currentWatermark = previousWatermark.orElse(BoundedWindow.TIMESTAMP_MIN_VALUE);
}


@Override
public Instant getTimestampForRecord(PartitionContext ctx, KafkaRecord<String, Foo> record) {
    currentWatermark = new Instant(record.getKV().getValue().getTimestamp());
    return currentWatermark;
}

@Override
public Instant getWatermark(PartitionContext ctx) {
    return currentWatermark;
}

}

然后,当您使用功能接口TimestampPolicyFactory设置KafkaIO源时,您需要传递自定义的TimestampPolicy。

KafkaIO.<String, Foo>read().withBootstrapServers("http://localhost:9092")
                .withTopic("foo")
                .withKeyDeserializer(StringDeserializer.class)
                .withValueDeserializerAndCoder(KafkaAvroDeserializer.class, AvroCoder.of(Foo.class)) //if you use avro
                .withTimestampPolicyFactory((tp, previousWatermark) -> new CustomFieldTimePolicy(previousWatermark))
                .updateConsumerProperties(kafkaProperties))

此行负责创建新的timestampPolicy,传递相关分区和先前的检查点水印,请参见documentation

withTimestampPolicyFactory(tp, previousWatermark) -> new CustomFieldTimePolicy(previousWatermark))