我的来源是KafkaIO.read()
,现在我想使用ParDo来解码来自kafka的消息,并使用消息的一个字段作为该消息的事件时间。我该怎么做?我没有找到有关此操作的示例。
答案 0 :(得分:0)
首先,您需要通过扩展TimestampPolicy<KeyT,ValueT>
例如:
public class CustomFieldTimePolicy extends TimestampPolicy<String, Foo> {
protected Instant currentWatermark;
public CustomFieldTimePolicy(Optional<Instant> previousWatermark) {
currentWatermark = previousWatermark.orElse(BoundedWindow.TIMESTAMP_MIN_VALUE);
}
@Override
public Instant getTimestampForRecord(PartitionContext ctx, KafkaRecord<String, Foo> record) {
currentWatermark = new Instant(record.getKV().getValue().getTimestamp());
return currentWatermark;
}
@Override
public Instant getWatermark(PartitionContext ctx) {
return currentWatermark;
}
}
然后,当您使用功能接口TimestampPolicyFactory
设置KafkaIO源时,您需要传递自定义的TimestampPolicy。
KafkaIO.<String, Foo>read().withBootstrapServers("http://localhost:9092")
.withTopic("foo")
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder(KafkaAvroDeserializer.class, AvroCoder.of(Foo.class)) //if you use avro
.withTimestampPolicyFactory((tp, previousWatermark) -> new CustomFieldTimePolicy(previousWatermark))
.updateConsumerProperties(kafkaProperties))
此行负责创建新的timestampPolicy,传递相关分区和先前的检查点水印,请参见documentation
withTimestampPolicyFactory(tp, previousWatermark) -> new CustomFieldTimePolicy(previousWatermark))