Question

我有一个50k条目的列表，我将进入我的数据库。

var tickets = [new Ticket(), new Ticket(), ...]; // 50k of them
tickets.forEach(function (t, ind){
    console.log(ind+1 + '/' + tickets.length);
    Ticket.findOneAndUpdate({id: t.id}, t, {upsert: true}, function (err, doc){
        if (err){
            console.log(err);
        } else {
            console.log('inserted');
        }
    });
});

而不是预期的交错

1 / 50000
           inserted
2 / 50000
           inserted

我得到了所有插入的确认后面的所有索引

1 / 50000
2 / 50000
...
50000 / 50000
inserted
inserted
...
inserted

我认为process.nextTick正在发生一些事情。几千条记录之后出现了明显的放缓。

有谁知道如何获得高效的交错？

Answer 1

您正在遇到节点异步性的奇迹。它将upsert请求发送到以太网，然后继续到下一条记录而不等待响应。这是否重要，因为它只是一条与upsert不同步的信息性消息。如果需要确保它们按顺序完成，您可能希望使用Async库来翻转数组。

Answer 2

而不是预期的交错

这只是同步I / O的预期行为。

请记住，这些操作都是异步，这是node.js的一个关键概念。代码的作用是：

for each item in the list, 
  'start a function' // <-- this will immediately look at the next item
    output a number (happens immediately)
      do some long-running operation over the network with connection pooling 
      and batching. When done, 
         call a callback that says 'inserted'

现在，代码将启动大量这些函数，而这些函数又将请求发送到数据库。在第一个请求甚至到达数据库之前很久就会发生这一切。操作系统很可能甚至不会在您进入之前发送第一个TCP数据包，比如5号或10号左右。

要回答您的评论中的问题：不，请求将在相对较快的时间内发送（由操作系统决定），但结果不会达到您的单线程循环之前的javascript代码还没有排队50k条目。这是因为forEach是您当前正在运行的代码段，并且在其运行时进入的所有事件仅在完成后才会被处理 - 您将会观察到相同的事件如果您使用setTimeout(function() { console.log("inserted... not") }, 0)而不是实际的数据库调用，因为setTimeout也是异步事件。

要使代码完全异步，您的数据源应该是某种提供数据的（异步）迭代器，而不是大量的项目。

mongoDB insert和process.nextTick

2 个答案: