我有一个Kafka Streams拓扑,其中我要联接5个表,每个表都是在一个主题上创建的,该主题由一些产生KeyValue事件的Kafka Connector填充,其中Key是针对同一Avro架构生成的,但是在我的拓扑中,当我加入表时,键似乎不一样,如果它们是Java,则事件相等。所有这些背后的原因是什么?
它与Confluent Schema Registry集成。
我们使用了调试器,并且在调试时看到,在不同主题上收到的但具有相同值的两个键是相等的。但是同时,如果执行在商店中基于主题A接收到的密钥在主题B顶部构建的查找,则该查询将不匹配任何内容。
fun streamsBuilder(): StreamsBuilder {
val streamsBuilder = StreamsBuilder()
val productsStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputWebshopProductsTopic)
val productPricesStream = streamsBuilder.stream<Key, PriceVariantsHolder>(streamNameRepository.productsPricesStreamTopic)
val productsRatingsStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputProductRatingsTopic)
val inputProductsStockStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputProductsStockTopic)
val productsStockStream =
inputProductsStockStream.map { key, value -> toKeyValue(key, productStockMapper.aStockQuantity(value)) }
productsStockStream.to(streamNameRepository.productsStockStreamTopic)
streamsBuilder.globalTable<Key, StockQuantity>(streamNameRepository.productsStockStreamTopic,
Materialized.`as`(streamNameRepository.productsStockGlobalStoreTopic))
val saleProductsTable = productsStream
.filter { _, aggregate -> aggregate.payload != null }
.map { key, aggregate -> toKeyValue(key, saleProductMapper.aSaleProduct(aggregate) { productsStockStore().get(Key(it)) }) }
.mapValues { saleProduct -> log.debug("received $saleProduct"); saleProduct; }
.groupByKey()
.reduce({ _, saleProductAvro -> saleProductAvro }, Materialized.`as`(streamNameRepository.saleProductsStoreTopic))
val productPricesTable = productPricesStream
.map { key, aggregate -> toKeyValue(key, aggregate) }
.groupByKey()
.reduce({ _, price -> price }, Materialized.`as`(streamNameRepository.productsPricesStoreTopic))
val productsRatingsTable = productsRatingsStream
.map { key, aggregate -> toKeyValue(key, productRatingMapper.aProductRating(aggregate)) }
.groupByKey()
.reduce({ _, aggregate -> aggregate }, Materialized.`as`(streamNameRepository.productsRatingsStoreTopic))
val productsStockTable = productsStockStream
.map { key, aggregate -> toKeyValue(key, aggregate) }
.groupByKey()
.reduce { _, aggregate -> aggregate }
val productsInNeedOfVariantStockUpdate = productsInNeedOfVariantStockUpdate(productsStockTable, saleProductsTable)
saleProductsTable
.outerJoin(productPricesTable, saleProductMapper::aPricedSaleProduct)
.outerJoin(productsRatingsTable, saleProductMapper::aRatedSaleProduct)
.outerJoin(productsStockTable, saleProductMapper::aQuantifiedSaleProduct)
.outerJoin(productsInNeedOfVariantStockUpdate, saleProductMapper::aSaleProductWithUpdatedVariantStock)
.toStream()
.filter { _, saleProductAvro -> saleProductAvro.id != null }
.mapValues { value -> log.debug("publish {}", value); value; }
.to(streamNameRepository.outputSaleProductsTopic)
return streamsBuilder
}
private fun <V> toKeyValue(key: Key, value: V): KeyValue<Key, V> {
return KeyValue(Key.newBuilder(key).build(), value)
}
答案 0 :(得分:1)
如果与Confluent Schema Registry集成,则每个主题的魔术字节将有所不同,因此联接将无法按预期工作(因为键比较发生在字节级别...)
有点期望。这个问题有时会出现,并且很难在Kafka Streams中本地(即内置)解决,因为Confluent Schema Registry是第三方工具,并且Kafka Streams不可知。
但是有解决方法。
一种解决方法是将拓扑结构中收到的每个密钥重新映射为新的密钥,现在拓扑结构中的所有密钥都使用相同的Avro架构(通过架构ID相同的Avro架构)生成。
其他替代方法(实际上并不更好)是“剥离魔术字节”或为连接键使用其他数据类型(例如,某些POJO)。因此,所有这些方法都是相似的。