Question

我正在EC2中执行数据密集型操作。我有一个包含5000万行的表，我正在分析这些行之间的关系，以构建类似数据的集群。

以我目前的速度，应该在40个小时左右完成。为了达到这个速度，我在不同的进程之间分配了我的操作（在节点中使用pm2）。

这些进程是否真正相互独立？当我浏览日志时，我经常发现如果一个进程特别重CPU，其他进程似乎要等到繁忙进程完成。

当操作很快时，日志如下所示：

Process 1 | started task
Process 2 | started task
Process 1 | completed task
Process 2 | completed task

但是当进程1上的任务很繁重时，日志看起来更像是这样：

Process 1 | started task (waiting a long time...)
Process 1 | completed task
Process 2 | started task
Process 2 | completed task

就好像CPU特别重的操作从其他进程中偷走了mojo，让我觉得这些进程毕竟不是彼此独立的。

虽然代码库本身非常大，但这是它的简化版本（正如评论中所要求的那样）：

// pm2 launches processes with this method
async eachProcessStartsWithThisMethod () {

  // set up Redis listener using Bull
  queue.process(job => async {
     await cpuHeavyMethod(job.data.id)
  })

}
async cpuHeavyMethod (jobId) {
  // pull data from sources including MySQL, Redis and Elasticache
  // analyze the data using nested for-loops
  // bulk update/insert into MySQL based on responses
}

以下是过去3小时的CPU使用率示例（在运行13个进程的实例上）：

以下是关联的pm2列表：

这是来自htop的快照：

EC2流程只是一种错觉吗？我的照片出了什么问题？

进程如何在EC2中的数据密集型操作中共享CPU？

0 个答案: