如何使用JAVA对kafka流中的AVRO格式化数据执行连接操作

时间:2018-05-21 10:38:29

标签: apache-kafka avro kafka-consumer-api apache-kafka-streams

STREAM-1:

pivot_table

STREAM-2:

 [KSTREAM-SOURCE-0000000000]: null, {"id": 1, "name": "john", "age": 26, "updated_at": 1525774480752}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 2, "name": "jane", "age": 24, "updated_at": 1525774480784}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 3, "name": "julia", "age": 25, "updated_at": 1525774480827}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 4, "name": "jamie", "age": 22, "updated_at": 1525774480875}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 5, "name": "jenny", "age": 27, "updated_at": 1525774482927}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 6, "name": "kishore", "age": 27, "updated_at": 1525775063908}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 7, "name": "purna", "age": 27, "updated_at": 1525775072006}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 8, "name": "xxx", "age": 10, "updated_at": 1525783464123}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 9, "name": "yyy", "age": 10, "updated_at": 1525783667644}
 [KSTREAM-SOURCE-0000000000]: null, {"id": 10, "name": "zzz", "age": 10, "updated_at": 1525783741814}

现在我想对两个流执行JOIN操作,并且只想检索stream-1中不存在的stream-1行。我的输入流数据是AVRO格式

预期产出:

[KSTREAM-SOURCE-0000000002]: null, {"id": 1, "name": "d", "age": 67}
[KSTREAM-SOURCE-0000000002]: null, {"id": 2, "name": "e", "age": 78}
[KSTREAM-SOURCE-0000000002]: null, {"id": 12, "name": "d", "age": 67}
[KSTREAM-SOURCE-0000000002]: null, {"id": 21, "name": "e", "age": 78}

那么我应该执行哪个JOIN操作以及如何实现我的预期输出?任何人都可以帮助我实现这个目标

1 个答案:

答案 0 :(得分:0)

如果您查看此处的文档:kafka streams join semantics您可以使用左连接,只需在设置来自stream2的值时在您的值连接器中返回null。

一些伪代码:

__set
免责声明:我没有测试过这个。