NodeJS:将大量同步任务分解为异步任务

时间:2014-02-15 09:01:19

标签: node.js asynchronous

我正在处理大量作业,然后将它们写入数据库。工作流程是:

  1. 100 MB数据读入缓冲区
  2. 循环访问数据,然后处理(同步工作)并写入磁盘(异步工作)
  3. 我遇到的问题是,它将完成所有100 MB数据的循环,同时在事件循环后排队所有写入磁盘。因此,它将首先遍历所有数据,然后运行异步作业。

    我想打破迭代数组的同步任务,这样每次迭代都会在事件循环后排队。

    var lotsOfWorkToBeDone = ['tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job']
    
    while (true) {
      var job = lotsOfWorkToBeDone.pop()
      if (!job) {
        break
      }
      var syncResult = syncWork(job)
      asyncWork(syncResult)
    }
    
    function syncWork(job) {
      console.log('sync work:', job)
      return 'Sync Result of ' + job
    };
    
    function asyncWork(syncResult) {
      setTimeout(function() {
        console.log('async work: ', syncResult)
      }, 0)
    }
    
    
    // Desire Outcome
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    
    // Actual Outcome
    // sync work: tens of thousands of job
    // sync work: tens of thousands of job
    // sync work: tens of thousands of job
    // sync work: tens of thousands of job
    // sync work: tens of thousands of job
    // sync work: tens of thousands of job
    // sync work: tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    // async work: Sync Result of tens of thousands of job
    

    注意:该示例是现实的简化版本。我没有可以迭代的数组。我有一个很大的缓冲区,我处理直到EOF(因此,while循环)

2 个答案:

答案 0 :(得分:2)

使用async.whilst似乎可以达到预期的效果。

我现在不接受我自己的答案,因为我对这个解决方案的评论感兴趣。可能有更好的解决方案

var async = require('async')
var lotsOfWorkToBeDone = ['tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job', 'tens of thousands of job']

var job;
async.whilst(function() {
  job = lotsOfWorkToBeDone.pop()
  return job
}, function(callback) {
  var syncResult = syncWork(job)
  asyncWork(syncResult, callback)
}, function(err) {
  console.log('error: ', err)
})

function syncWork(job) {
  console.log('sync work:', job)
  return 'Sync Result of ' + job
};

function asyncWork(syncResult, callback) {
  setTimeout(function() {
    console.log('async work: ', syncResult)
    callback()
  }, 0)
}

// sync work: tens of thousands of job
// async work:  Sync Result of tens of thousands of job
// sync work: tens of thousands of job
// async work:  Sync Result of tens of thousands of job
// sync work: tens of thousands of job
// async work:  Sync Result of tens of thousands of job
// sync work: tens of thousands of job
// async work:  Sync Result of tens of thousands of job
// sync work: tens of thousands of job
// async work:  Sync Result of tens of thousands of job
// sync work: tens of thousands of job
// async work:  Sync Result of tens of thousands of job
// sync work: tens of thousands of job
// async work:  Sync Result of tens of thousands of job

答案 1 :(得分:0)

考虑使用流。缓冲100 MB听起来像个坏主意。最终的代码看起来像这样:

inputStream.pipe(yourTransformStream).pipe(outputStream);

所有逻辑都将实现为Transform流。