Question

我有一个Kafka Streams拓扑，其中我要联接5个表，每个表都是在一个主题上创建的，该主题由一些产生KeyValue事件的Kafka Connector填充，其中Key是针对同一Avro架构生成的，但是在我的拓扑中，当我加入表时，键似乎不一样，如果它们是Java，则事件相等。所有这些背后的原因是什么？

它与Confluent Schema Registry集成。

我们使用了调试器，并且在调试时看到，在不同主题上收到的但具有相同值的两个键是相等的。但是同时，如果执行在商店中基于主题A接收到的密钥在主题B顶部构建的查找，则该查询将不匹配任何内容。

fun streamsBuilder(): StreamsBuilder {
    val streamsBuilder = StreamsBuilder()
    val productsStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputWebshopProductsTopic)
    val productPricesStream = streamsBuilder.stream<Key, PriceVariantsHolder>(streamNameRepository.productsPricesStreamTopic)
    val productsRatingsStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputProductRatingsTopic)
    val inputProductsStockStream = streamsBuilder.stream<Key, Aggregate>(streamNameRepository.inputProductsStockTopic)

    val productsStockStream =
            inputProductsStockStream.map { key, value -> toKeyValue(key, productStockMapper.aStockQuantity(value)) }
    productsStockStream.to(streamNameRepository.productsStockStreamTopic)

    streamsBuilder.globalTable<Key, StockQuantity>(streamNameRepository.productsStockStreamTopic,
            Materialized.`as`(streamNameRepository.productsStockGlobalStoreTopic))

    val saleProductsTable = productsStream
            .filter { _, aggregate -> aggregate.payload != null }
            .map { key, aggregate -> toKeyValue(key, saleProductMapper.aSaleProduct(aggregate) { productsStockStore().get(Key(it)) }) }
            .mapValues { saleProduct -> log.debug("received $saleProduct"); saleProduct; }
            .groupByKey()
            .reduce({ _, saleProductAvro -> saleProductAvro }, Materialized.`as`(streamNameRepository.saleProductsStoreTopic))

    val productPricesTable = productPricesStream
            .map { key, aggregate -> toKeyValue(key, aggregate) }
            .groupByKey()
            .reduce({ _, price -> price }, Materialized.`as`(streamNameRepository.productsPricesStoreTopic))

    val productsRatingsTable = productsRatingsStream
            .map { key, aggregate -> toKeyValue(key, productRatingMapper.aProductRating(aggregate)) }
            .groupByKey()
            .reduce({ _, aggregate -> aggregate }, Materialized.`as`(streamNameRepository.productsRatingsStoreTopic))

    val productsStockTable = productsStockStream
            .map { key, aggregate -> toKeyValue(key, aggregate) }
            .groupByKey()
            .reduce { _, aggregate -> aggregate }

    val productsInNeedOfVariantStockUpdate = productsInNeedOfVariantStockUpdate(productsStockTable, saleProductsTable)

    saleProductsTable
            .outerJoin(productPricesTable, saleProductMapper::aPricedSaleProduct)
            .outerJoin(productsRatingsTable, saleProductMapper::aRatedSaleProduct)
            .outerJoin(productsStockTable, saleProductMapper::aQuantifiedSaleProduct)
            .outerJoin(productsInNeedOfVariantStockUpdate, saleProductMapper::aSaleProductWithUpdatedVariantStock)
            .toStream()
            .filter { _, saleProductAvro -> saleProductAvro.id != null }
            .mapValues { value -> log.debug("publish {}", value); value; }
            .to(streamNameRepository.outputSaleProductsTopic)

    return streamsBuilder
}

private fun <V> toKeyValue(key: Key, value: V): KeyValue<Key, V> {
    return KeyValue(Key.newBuilder(key).build(), value)
}

Answer 1

如果与Confluent Schema Registry集成，则每个主题的魔术字节将有所不同，因此联接将无法按预期工作（因为键比较发生在字节级别...）

有点期望。这个问题有时会出现，并且很难在Kafka Streams中本地（即内置）解决，因为Confluent Schema Registry是第三方工具，并且Kafka Streams不可知。

但是有解决方法。

一种解决方法是将拓扑结构中收到的每个密钥重新映射为新的密钥，现在拓扑结构中的所有密钥都使用相同的Avro架构（通过架构ID相同的Avro架构）生成。

其他替代方法（实际上并不更好）是“剥离魔术字节”或为连接键使用其他数据类型（例如，某些POJO）。因此，所有这些方法都是相似的。

Kafka Streams拓扑不同的键但模式相同

1 个答案: