I'm planning to use DynamoDB whose data needs to be synced to CloudSearch. I understand Lambda can be used, but I want to use Kinesis for that. So the Producer would be DynamoDB, and it would generate the data for stream for each PUT/DELETE in the table.
My design is very straight forward: (Assuming consumer receives records orderly)
I'm having issues figuring out how would KCL ensure ordered delivery of records on consumer end, when multiple shards are there. From the API documentation, here's what I understand
However, if I want to sync the data from DynamoDB to CloudSearch, then I need to make sure that all records are synced in exact same order. Here's where I'm getting confused :
答案 0 :(得分:1)
如果我的想法是正确的,那么我怎样才能实现有两个分片的有序接收?
您自己不进行同步。相反,您需要仔细考虑并选择一个分区键,以便形成的分区可以独立处理。
E.g。你是索引记录,记录有一个id字段。如果您可以同时更新搜索索引中具有不同ID的记录,那么记录ID将是一个合适的字段作为分区键。
使用KCL:
它提供记录排序,以及以相同顺序读取和/或重放记录到多个Amazon Kinesis应用程序的能力。 Amazon Kinesis客户端库(KCL)将给定分区密钥的所有记录提供给同一记录处理器,从而可以更轻松地构建从同一Amazon Kinesis流读取的多个应用程序(例如,执行计数,聚合和过滤)。