Kafka Streams拓扑不同的键但模式相同

时间:2019-11-28 10:54:35

标签: apache-kafka apache-kafka-streams

我有一个Kafka Streams拓扑,其中我要联接5个表,每个表都是在一个主题上创建的,该主题由一些产生KeyValue事件的Kafka Connector填充,其中Key是针对同一Avro架构生成的,但是在我的拓扑中,当我加入表时,键似乎不一样,如果它们是Java,则事件相等。所有这些背后的原因是什么?

它与Confluent Schema Registry集成。

我们使用了调试器,并且在调试时看到,在不同主题上收到的但具有相同值的两个键是相等的。但是同时,如果执行在商店中基于主题A接收到的密钥在主题B顶部构建的查找,则该查询将不匹配任何内容。

fun streamsBuilder(): StreamsBuilder {
    val streamsBuilder = StreamsBuilder()
    val productsStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputWebshopProductsTopic)
    val productPricesStream = streamsBuilder.stream<Key, PriceVariantsHolder>(streamNameRepository.productsPricesStreamTopic)
    val productsRatingsStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputProductRatingsTopic)
    val inputProductsStockStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputProductsStockTopic)

    val productsStockStream =
            inputProductsStockStream.map { key, value -> toKeyValue(key, productStockMapper.aStockQuantity(value)) }
    productsStockStream.to(streamNameRepository.productsStockStreamTopic)

    streamsBuilder.globalTable<Key, StockQuantity>(streamNameRepository.productsStockStreamTopic,
            Materialized.`as`(streamNameRepository.productsStockGlobalStoreTopic))

    val saleProductsTable = productsStream
            .filter { _, aggregate -> aggregate.payload != null }
            .map { key, aggregate -> toKeyValue(key, saleProductMapper.aSaleProduct(aggregate) { productsStockStore().get(Key(it)) }) }
            .mapValues { saleProduct -> log.debug("received $saleProduct"); saleProduct; }
            .groupByKey()
            .reduce({ _, saleProductAvro -> saleProductAvro }, Materialized.`as`(streamNameRepository.saleProductsStoreTopic))

    val productPricesTable = productPricesStream
            .map { key, aggregate -> toKeyValue(key, aggregate) }
            .groupByKey()
            .reduce({ _, price -> price }, Materialized.`as`(streamNameRepository.productsPricesStoreTopic))

    val productsRatingsTable = productsRatingsStream
            .map { key, aggregate -> toKeyValue(key, productRatingMapper.aProductRating(aggregate)) }
            .groupByKey()
            .reduce({ _, aggregate -> aggregate }, Materialized.`as`(streamNameRepository.productsRatingsStoreTopic))

    val productsStockTable = productsStockStream
            .map { key, aggregate -> toKeyValue(key, aggregate) }
            .groupByKey()
            .reduce { _, aggregate -> aggregate }

    val productsInNeedOfVariantStockUpdate = productsInNeedOfVariantStockUpdate(productsStockTable, saleProductsTable)

    saleProductsTable
            .outerJoin(productPricesTable, saleProductMapper::aPricedSaleProduct)
            .outerJoin(productsRatingsTable, saleProductMapper::aRatedSaleProduct)
            .outerJoin(productsStockTable, saleProductMapper::aQuantifiedSaleProduct)
            .outerJoin(productsInNeedOfVariantStockUpdate, saleProductMapper::aSaleProductWithUpdatedVariantStock)
            .toStream()
            .filter { _, saleProductAvro -> saleProductAvro.id != null }
            .mapValues { value -> log.debug("publish {}", value); value; }
            .to(streamNameRepository.outputSaleProductsTopic)

    return streamsBuilder
}

private fun <V> toKeyValue(key: Key, value: V): KeyValue<Key, V> {
    return KeyValue(Key.newBuilder(key).build(), value)
}

1 个答案:

答案 0 :(得分:1)

如果与Confluent Schema Registry集成,则每个主题的魔术字节将有所不同,因此联接将无法按预期工作(因为键比较发生在字节级别...)

有点期望。这个问题有时会出现,并且很难在Kafka Streams中本地(即内置)解决,因为Confluent Schema Registry是第三方工具,并且Kafka Streams不可知。

但是有解决方法。

一种解决方法是将拓扑结构中收到的每个密钥重新映射为新的密钥,现在拓扑结构中的所有密钥都使用相同的Avro架构(通过架构ID相同的Avro架构)生成。

其他替代方法(实际上并不更好)是“剥离魔术字节”或为连接键使用其他数据类型(例如,某些POJO)。因此,所有这些方法都是相似的。