How can lambda be used to keep DynamoDB and Cloud Search in sync

时间:2017-08-05 11:39:40

标签: amazon-dynamodb aws-lambda amazon-kinesis amazon-cloudsearch

Assuming we're using AWS Triggers on DynamoDB Table, and that trigger is to run a lambda function, whose job is to update entry into CloudSearch (to keep DynamoDB and CS in sync).

I'm not so clear on how Lambda would always keep the data in sync with the data in dynamoDB. Consider the following flow:

  1. Application updates a DynamoDB table's Record A (say to A1)
  2. Very closely after that Application updates same table's same record A (to A2)
  3. Trigger for 1 causes Lambda of 1 to start execute
  4. Trigger for 2 causes Lambda of 2 to start execute
  5. Step 4 completes first, so CloudSearch sees A2
  6. Now Step 3 completes, so CloudSearch sees A1

Lambda triggers are not guaranteed to start ONLY after previous invocation is complete (Correct if wrong, and provide me link)

As we can see, the thing goes out of sync.

The closest I can think which will work is to use AWS Kinesis Streams, but those too with a single Shard (1MB ps limit ingestion). If that restriction works, then your consumer application can be written such that the record is first processed sequentially, i.e., only after previous record is put into CS, then the next record should be processed. Assuming the aforementioned statement is true, how to ensure the sync happens correctly, if there is so much of data ingestion into DynamoDB that more than one shards are needed n Kinesis?

2 个答案:

答案 0 :(得分:0)

您可以使用DynamoDB Streams实现这一目标:

DynamoDB Streams

" DynamoDB流是有关Amazon DynamoDB表中项目更改的有序信息流。"

DynamoDB Streams保证以下内容:

  • 每个流记录在流中只出现一次。
  • 对于在DynamoDB表中修改的每个项目,流记录的显示顺序与对项目的实际修改的顺序相同。

DynamoDB Streams的另一个很酷的事情,如果您的Lambda无法处理流(例如,在Cloud Search中编制索引时出现任何错误),该事件将继续重试,其他记录流将等待您的上下文成功。

我们使用Streams使我们的Elastic Search索引与DynamoDB表保持同步。

答案 1 :(得分:0)

AWS Lambda F&Q Link

  

问:AWS Lambda如何处理来自Amazon Kinesis流和Amazon DynamoDB Streams的数据?

     

每个分片严格序列化发送到您的AWS Lambda函数的Amazon Kinesis和DynamoDB Streams记录。这意味着如果将两个记录放在同一个分片中,Lambda保证在使用第二个记录调用Lambda函数之前,将使用第一个记录成功调用它。如果一个记录的调用超时,受到限制或遇到任何其他错误,Lambda将重试直到成功(或记录达到其24小时到期),然后再转到下一个记录。不保证跨不同分片的记录顺序,并且每个分片的处理并行发生。

这意味着Lambda将逐个选择一个分片中的记录,以便它们出现在分片中,并且在处理前一个记录之前不执行新记录!

然而,剩下的另一个问题是,如果相同记录的条目存在于不同的分片中,该怎么办?值得庆幸的是,AWS DynamoDB Streams确保主键始终只驻留在特定的Shard中。 (基本上,我认为,主键是用于查找指向分片的哈希值)AWS Slide Link。请参阅下面的more from AWS Blog

  

对单个主键所做的更改序列的相对排序将保留在分片中。此外,给定密钥将存在于在给定时间点有效的一组兄弟分片中的至多一个中。因此,您的代码可以简单地处理分片中的流记录,以便准确跟踪项目的更改。