假设我有一个事件流。
R1-{“ abc”:“值1”}
R2-{“ abc”:“值2”}
R3-{“ abc”:“值3”}
R4-{“ abc”:“值4”}
在单个分区中。我希望从上述流派生的事件流具有
之类的事件{“ abc”:[“值1”,“值2”,“值3”,“值4”]}
给每条记录 具有相同键的主题已经可以在主题中使用。
如何在Kafka Stream API中使用聚合和groupByKey做到这一点?
答案 0 :(得分:0)
以下是JSON事件流的示例,您可以尝试执行以下操作:
KTable<Windowed<String>, JsonNode> timeWindowedAggregatedStream = stream.groupByKey().windowedBy(Duration.ofMinutes(5))
.aggregate(
() -> objectMapper::createObjectNode, /* initializer */
(aggKey, newValue, aggValue) -> {
final JsonNode element = value.has(fieldName) && value.get(fieldName) != null ? value.get(fieldName) : null;
final ArrayNode arrayNode = aggregate == null || aggregate.get(fieldName) != null
? (ArrayNode) aggregate.get(fieldName)
: mapper.createArrayNode();
arrayNode.add(element);
// TO remove duplicates
Stream<Object> elementStream = IntStream.range(0, arrayNode.size()).mapToObj(arrayNode::get);
Set<Object> arrayAsSet = elementStream.collect(Collectors.toSet());
ObjectNode aggregateNode = mapper.createObjectNode();
ArrayNode uniqueArrayNode = mapper.valueToTree(arrayAsSet);
aggregate.set(fieldName, uniqueArrayNode);
return aggregate;
} , /* adder */
Materialized.<String, JsonNode, WindowStore<Bytes, byte[]>>as("time-windowed-aggregated-stream-store") /* state store name */
.withValueSerde(jsonNodeSerde)); /* serde for aggregate value */