鉴于我有一个如下所示的拓扑:
简而言之:
用A_Value
输入A_Id
KTable<A_Id, A_Value>
store-A_Value-by-A_Id-ktable
带有B_Value
的流输入B_Id
mapValues
:为了将墓碑替换为具有B_Value
属性的有效deleted: true
groupByKey
:KGroupedStream<B_Id, B_Value>
reduce
:在这里,如果 new B_Value.isDeleted() == true
,用B_Value
返回 old deleted: true
,则为重要,因为稍后我需要A_Id
属性KTable<B_Id, B_Value>
store-B_Value-by-B_Id-ktable
输入KTable<B_Id, B_Value>
toStream
groupBy
:B_Value.getA_Id()
recude
:KTable<A_Id, B_Value>
加入KTable<A_Id, B_Value>
和KTable<A_Id, A_Value>
:KTable<A_Id, Pair<B_Value, A_Value>>
toStream
map
:如果B_Value.isDeleted() == true
则传播KeyValue<B_Id, null
(墓碑),否则传播KeyValue<B_Id, Pair<B_Value, A_Value>>
to
:输出主题在Kafka Streams拓扑中:
子拓扑:0
Source: KSTREAM-SOURCE-0000000000 (topics: [input-A_value-A_key])
--> KTABLE-SOURCE-0000000001
Source: KSTREAM-SOURCE-0000000011 (topics: [input-B_value-A_key])
--> KSTREAM-REDUCE-0000000008
Processor: KSTREAM-REDUCE-0000000008 (stores: [store-B-by-A_key-ktable])
--> KTABLE-JOINTHIS-0000000013
<-- KSTREAM-SOURCE-0000000011
Processor: KTABLE-SOURCE-0000000001 (stores: [store-A-by-A_key-ktable])
--> KTABLE-JOINOTHER-0000000014
<-- KSTREAM-SOURCE-0000000000
Processor: KTABLE-JOINOTHER-0000000014 (stores: [store-B-by-A_key-ktable])
--> KTABLE-MERGE-0000000012
<-- KTABLE-SOURCE-0000000001
Processor: KTABLE-JOINTHIS-0000000013 (stores: [store-A-by-A_key-ktable])
--> KTABLE-MERGE-0000000012
<-- KSTREAM-REDUCE-0000000008
Processor: KTABLE-MERGE-0000000012 (stores: [])
--> KTABLE-TOSTREAM-0000000015
<-- KTABLE-JOINTHIS-0000000013, KTABLE-JOINOTHER-0000000014
Processor: KTABLE-TOSTREAM-0000000015 (stores: [])
--> KSTREAM-MAP-0000000016
<-- KTABLE-MERGE-0000000012
Processor: KSTREAM-MAP-0000000016 (stores: [])
--> KSTREAM-SINK-0000000017
<-- KTABLE-TOSTREAM-0000000015
Sink: KSTREAM-SINK-0000000017 (topic: output-AB_aggregate-B_key)
<-- KSTREAM-MAP-0000000016
子拓扑:1
Source: KSTREAM-SOURCE-0000000002 (topics: [input-B_value-B_key])
--> KSTREAM-MAPVALUES-0000000003
Processor: KSTREAM-MAPVALUES-0000000003 (stores: [])
--> KSTREAM-REDUCE-0000000004
<-- KSTREAM-SOURCE-0000000002
Processor: KSTREAM-REDUCE-0000000004 (stores: [store-B_value-B_key-ktable])
--> KTABLE-TOSTREAM-0000000005
<-- KSTREAM-MAPVALUES-0000000003
Processor: KTABLE-TOSTREAM-0000000005 (stores: [])
--> KSTREAM-FILTER-0000000006
<-- KSTREAM-REDUCE-0000000004
Processor: KSTREAM-FILTER-0000000006 (stores: [])
--> KSTREAM-KEY-SELECT-0000000007
<-- KTABLE-TOSTREAM-0000000005
Processor: KSTREAM-KEY-SELECT-0000000007 (stores: [])
--> KSTREAM-FILTER-0000000010
<-- KSTREAM-FILTER-0000000006
Processor: KSTREAM-FILTER-0000000010 (stores: [])
--> KSTREAM-SINK-0000000009
<-- KSTREAM-KEY-SELECT-0000000007
Sink: KSTREAM-SINK-0000000009 (topic: input-B_value-A_key)
<-- KSTREAM-FILTER-0000000010
重要的是导致这种拓扑的约束条件:
B
的值为A_Id
下面列出的所有预期行为对于新的传入数据都非常有效:
观察到的问题发生在流媒体的第一个引导期间,在该引导中,预期的连接数(即1000)实际上为70,如果流媒体继续运行,则不会出现任何不一致的情况。
数字1000来自唯一的B
记录,这些记录在我的原始主题数据中具有匹配的A
记录,我还检查了转换后我所有的变更日志存储区是否都具有正确的数据,并且
我尝试了什么?
WallclockTimestampExtractor
TimestampExtractor
的输入主题使用自定义0
ValueTransformerWithKeySupplier
代替内部连接,该内部连接的操作(基于壁和记录时间)为1秒我没有很多尝试的方法,我仍然可以尝试更深入,但是我想看看我是否在方法上首先犯了其他错误,谢谢您的帮助。