Kafka流式传输以构建物化视图

时间:2017-03-22 14:56:14

标签: apache-kafka-streams

我正在尝试从数据库更新流中生成某种物化视图(例如由DBMS的事务日志提供,借助于例如maxwell-daemon)。该视图被实现为Kafka压缩主题。

视图是一个简单的连接,可以表示为这样的查询:

SELECT u.email user_email, t.title todo_title, t.state todo_state
FROM   User u
JOIN   Todo t
ON     t.user_id = u.id

我希望每次User或Todo更改时都会更新视图(要在视图的kafka主题上发布消息)。

使用Kafka Streams,似乎可以通过这样做来实现这一目标:

  • 制作KTable用户更改
  • 制作Kodable Todo更改
  • 加入

但是,我不确定一些事情:

  • 这甚至可能吗?
  • 这会保持事件的原始排序吗?例如如果用户被更改,则Todo已更改,我是否可以保证在连接结果中按此顺序看到这些更改?
  • 如何处理交易?例如多个数据库更改可能是同一事务的一部分。如何确保两个KTable都是原子更新,并且所有连接结果只显示完全应用的事务?

1 个答案:

答案 0 :(得分:2)

  • Is that even possible ?

Yes. The pattern you describe will compute what you want out-of-the-box.

  • Will this maintain original ordering of events ? e.g. if User is changed, then Todo is changed, am I guaranteed to see these changes in this order in the result of the join ?

Streams will process data according to timestamps (ie, records with smaller timestamps first). Thus, in general this will work as expected. However, there is no strict guarantee because in stream processing it's more important to make progress all the time (and don't block). Thus, Streams only applies a "best effort approach" with regard to processing records in timestamp order. For example, if one changelog does not provide any data, Streams will just keep going only processing data from the other changelog (and not block). This might lead to "out of order" processing with regard to timestamps from different partitions/topics.

  • How to handle transactions ? e.g. multiple database changes might be part of the same transaction. How to make sure that both KTables are updates atomically, and that all join results show only fully-applied transactions ?

That's not possible at the moment. Each update will be processed individually and you will see each intermediate (ie, not committed) result. However, Kafka will introduce "transactional processing" in the future that will enable to handle transactions. (see https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging and https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics)