Question

背景：我有一个大于10,000个URL的数据数组，并且正在Node中编写一个脚本来执行以下操作：

A - Make a request to an URL, then extract an address from part of the response body; 
B - Take the postcode from that address, and based on that make a call to an API 
C - Take the API response, do some processing based on the response
D - PUT the resulting processed data in a DynamoDB.

也许因为我的数据数组很长，所以我对构造代码/应用程序的最佳方法感到困惑。我知道，对于每个URL，我都需要按顺序执行步骤A，B，C，要么成功，要么更新数据库，否则失败，并记录与失败的尝试关联的URL。

我想使用AWS Lambda和DynamoDB做到这一点。

当我是新手时，该如何以可扩展的方式进行此操作。 a）编写一个循环遍历10,000个长度的数组，执行A，B，C，D直到到达列表末尾并在项目之间放置一个长计时器的函数似乎有点愚蠢。 b）我想到的另一种选择是将阵列分成N批，例如10，并尝试使用异步瀑布（https://caolan.github.io/async/docs.html#waterfall）或Promise链接进行A，B，C，D，并尝试同时批量处理项目

我的编程问题是：（1）我描述的问题/方法的名称是什么？我不确定在哪里可以找到有关解决方法的文档（2）我上面概述的方法（a和b）有多现实。感谢您的指点。

我发现这个问题很有帮助：How do you structure sequential AWS service calls within lambda given all the calls are asynchronous?，但是它并不能解决如何处理大型数据集的问题。

使用Node抓取大型数据集的并发，批处理和处理方法

0 个答案: