AWS kinesis, how it ensures ordered delivery of messages if multiple shards are used

时间:2017-08-04 13:06:40

标签: amazon-kinesis

I'm planning to use DynamoDB whose data needs to be synced to CloudSearch. I understand Lambda can be used, but I want to use Kinesis for that. So the Producer would be DynamoDB, and it would generate the data for stream for each PUT/DELETE in the table.

My design is very straight forward: (Assuming consumer receives records orderly)

  • Receive the record
  • Sync to CloudSearch
  • (Repeat)

I'm having issues figuring out how would KCL ensure ordered delivery of records on consumer end, when multiple shards are there. From the API documentation, here's what I understand

  1. We need to create a per shard iterator, using GetShardIterator
  2. With that Shard Iterator, I can get all the items for that shard in a particular sequence.

However, if I want to sync the data from DynamoDB to CloudSearch, then I need to make sure that all records are synced in exact same order. Here's where I'm getting confused :

  1. Can items be put into the different shards at the same time?
  2. (If 1 is true), then if I have two Shards, I'll need a ShardIterator for each Shard right?
  3. (If 1,2 is true) If I need to ensure all the records are synced in an orderly fashion, then I need exactly one thread, which gets records in correct order, ain't so?
  4. If my thinking is correct, then how can I ever achieve ordered receive with two shards?

1 个答案:

答案 0 :(得分:1)

  

如果我的想法是正确的,那么我怎样才能实现有两个分片的有序接收?

您自己不进行同步。相反,您需要仔细考虑并选择一个分区键,以便形成的分区可以独立处理。

E.g。你是索引记录,记录有一个id字段。如果您可以同时更新搜索索引中具有不同ID的记录,那么记录ID将是一个合适的字段作为分区键。

使用KCL:

  

它提供记录排序,以及以相同顺序读取和/或重放记录到多个Amazon Kinesis应用程序的能力。 Amazon Kinesis客户端库(KCL)将给定分区密钥的所有记录提供给同一记录处理器,从而可以更轻松地构建从同一Amazon Kinesis流读取的多个应用程序(例如,执行计数,聚合和过滤)。

https://aws.amazon.com/kinesis/streams/