我正在尝试从数据库更新流中生成某种物化视图(例如由DBMS的事务日志提供,借助于例如maxwell-daemon)。该视图被实现为Kafka压缩主题。
视图是一个简单的连接,可以表示为这样的查询:
SELECT u.email user_email, t.title todo_title, t.state todo_state
FROM User u
JOIN Todo t
ON t.user_id = u.id
我希望每次User或Todo更改时都会更新视图(要在视图的kafka主题上发布消息)。
使用Kafka Streams,似乎可以通过这样做来实现这一目标:
但是,我不确定一些事情:
答案 0 :(得分:2)
Yes. The pattern you describe will compute what you want out-of-the-box.
Streams will process data according to timestamps (ie, records with smaller timestamps first). Thus, in general this will work as expected. However, there is no strict guarantee because in stream processing it's more important to make progress all the time (and don't block). Thus, Streams only applies a "best effort approach" with regard to processing records in timestamp order. For example, if one changelog does not provide any data, Streams will just keep going only processing data from the other changelog (and not block). This might lead to "out of order" processing with regard to timestamps from different partitions/topics.
That's not possible at the moment. Each update will be processed individually and you will see each intermediate (ie, not committed) result. However, Kafka will introduce "transactional processing" in the future that will enable to handle transactions. (see https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging and https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics)