我正在尝试在Flink中加入两种类型的事件(例如事件A和B)。我想确认我的理解是否正确。事件的某些属性-
我已将事件A的数据流配置为10分钟的BoundedOutOfOrdernessTimestampExtractor,将事件B的数据流配置为30分钟。稍后,我将使用Table API进行时间窗口联接。
我对以下内容的理解是否正确-
关于下面的代码还有其他建议吗?
queryConfig.withIdleStateRetentionTime(
org.apache.flink.api.common.time.Time.seconds(1),
org.apache.flink.api.common.time.Time.minutes(30))
val stream: DataStream[Any] = textStream.flatMap(json => convert(json))
val aStream: DataStream[ClassA] =
stream
.filter(obj => obj.isInstanceOf[ClassA])
.rebalance
.map(obj => obj.asInstanceOf[ClassA])
.assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor[ClassA](
Time.minutes(10)){
override def extractTimestamp(element: ClassA): Long =
element.serviceTimestamp.toInstant.toEpochMilli
})
val bStream: DataStream[ClassB] =
stream
.filter(obj => obj.isInstanceOf[ClassB])
.rebalance
.map(obj => obj.asInstanceOf[ClassB])
.assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor[ClassB](
Time.minutes(30)){
override def extractTimestamp(element: ClassB): Long =
element.timestamp.toInstant.toEpochMilli
})
val aTable: Table = tableEnv.fromDataStream[ClassA](aStream,
// The .rowtime is for setting event time attributes
'aTimestamp.rowtime as 'aTimestamp, 'aUniqueId, 'aItem)
val bTable: Table = tableEnv.fromDataStream[ClassB](bStream,
// The .rowtime is for setting event time attributes
// https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/time_attributes.html
'bTimestamp.rowtime as 'bTimestamp, 'uniqueId, 'bItem)
val result: Table = aTable
.join(aTable)
.where('aUniqueId === 'uniqueId
// Give ClassB events 30 minutes lateness.
// Use a time window join as optimization - https://stackoverflow.com/a/51620821
// & https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/joins.html#time-windowed-joins
// Both time clauses are need to qualify as time window join
&& 'bTimestamp >= 'aTimestamp
&& 'bTimestamp <= 'aTimestamp + 30.minutes)
// DO NOT change order without changing order in later parsing code
.select('uniqueId, 'aItem, 'bItem, 'bTimestamp, 'aTimestamp.cast(createTypeInformation[Timestamp]))
val outputStream: DataStream[ClassC] = tableEnv
.toAppendStream[(String, String, String, Timestamp, Timestamp)](result)
// TODO find better way to map to a POJO
.map(row => ClassCUtils.toClassC(row))
答案 0 :(得分:0)
事件一经接收就处理并加入,只要 因为它们在延迟窗口内(事件A为10分钟,而事件30为30分钟) 分钟B)。端到端延迟没有最小值 由于Flink的任何配置而受到限制。
是的。事件的接收时将对其进行映射和过滤,并将其放入缓冲区中,以满足连接窗口的要求。
该表格最多可保留事件30分钟,直到 水印从两个流中到达。后来根据水印 事件已清理
是的。 IntervalJoinOperator
将从连接的右侧和左侧接收事件,检查它们是否在时间范围内,如果是,则将它们发送到下游:
private <THIS, OTHER> void processElement(
final StreamRecord<THIS> record,
final MapState<Long, List<IntervalJoinOperator.BufferEntry<THIS>>> ourBuffer,
final MapState<Long, List<IntervalJoinOperator.BufferEntry<OTHER>>> otherBuffer,
final long relativeLowerBound,
final long relativeUpperBound,
final boolean isLeft) throws Exception {
final THIS ourValue = record.getValue();
final long ourTimestamp = record.getTimestamp();
if (ourTimestamp == Long.MIN_VALUE) {
throw new FlinkException("Long.MIN_VALUE timestamp: Elements used in " +
"interval stream joins need to have timestamps meaningful timestamps.");
}
if (isLate(ourTimestamp)) {
return;
}
addToBuffer(ourBuffer, ourValue, ourTimestamp);
for (Map.Entry<Long, List<BufferEntry<OTHER>>> bucket: otherBuffer.entries()) {
final long timestamp = bucket.getKey();
if (timestamp < ourTimestamp + relativeLowerBound ||
timestamp > ourTimestamp + relativeUpperBound) {
continue;
}
for (BufferEntry<OTHER> entry: bucket.getValue()) {
if (isLeft) {
collect((T1) ourValue, (T2) entry.element, ourTimestamp, timestamp);
} else {
collect((T1) entry.element, (T2) ourValue, timestamp, ourTimestamp);
}
}
}
long cleanupTime = (relativeUpperBound > 0L) ? ourTimestamp + relativeUpperBound : ourTimestamp;
if (isLeft) {
internalTimerService.registerEventTimeTimer(CLEANUP_NAMESPACE_LEFT, cleanupTime);
} else {
internalTimerService.registerEventTimeTimer(CLEANUP_NAMESPACE_RIGHT, cleanupTime);
}
}
以下代码中的查询配置是多余的,并不是真正的 必填
是的。当您使用无界运算符时,withIdleStateRetentionTime
是相关的,例如SQL中没有窗口属性的GROUP BY
子句。