应用错误收集

I'm planning to use DynamoDB whose data needs to be synced to CloudSearch. I understand Lambda can be used, but I want to use Kinesis for that. So the Producer would be DynamoDB, and it would generate the data for stream for each PUT/DELETE in the table.

My design is very straight forward: (Assuming consumer receives records orderly)

Receive the record
Sync to CloudSearch
(Repeat)

I'm having issues figuring out how would KCL ensure ordered delivery of records on consumer end, when multiple shards are there. From the API documentation, here's what I understand

We need to create a per shard iterator, using GetShardIterator
With that Shard Iterator, I can get all the items for that shard in a particular sequence.

However, if I want to sync the data from DynamoDB to CloudSearch, then I need to make sure that all records are synced in exact same order. Here's where I'm getting confused :

Can items be put into the different shards at the same time?
(If 1 is true), then if I have two Shards, I'll need a ShardIterator for each Shard right?
(If 1,2 is true) If I need to ensure all the records are synced in an orderly fashion, then I need exactly one thread, which gets records in correct order, ain't so?
If my thinking is correct, then how can I ever achieve ordered receive with two shards?

如果我的想法是正确的，那么我怎样才能实现有两个分片的有序接收？

您自己不进行同步。相反，您需要仔细考虑并选择一个分区键，以便形成的分区可以独立处理。

E.g。你是索引记录，记录有一个id字段。如果您可以同时更新搜索索引中具有不同ID的记录，那么记录ID将是一个合适的字段作为分区键。

使用KCL：

它提供记录排序，以及以相同顺序读取和/或重放记录到多个Amazon Kinesis应用程序的能力。 Amazon Kinesis客户端库（KCL）将给定分区密钥的所有记录提供给同一记录处理器，从而可以更轻松地构建从同一Amazon Kinesis流读取的多个应用程序（例如，执行计数，聚合和过滤）。

https://aws.amazon.com/kinesis/streams/

AWS kinesis, how it ensures ordered delivery of messages if multiple shards are used

1 个答案: