将请求流限制在服务器

Question

在我的node.js服务器中，我无法弄清楚，为什么内存不足。我的node.js服务器为它收到的每个http请求发出一个远程http请求，因此我尝试使用下面的示例脚本复制问题，该脚本也会耗尽内存。

只有在for循环中的迭代非常高时才会发生这种情况。

从我的角度来看，问题与node.js对远程http请求进行排队这一事实有关。怎么避免这个？

这是示例脚本：

(function() {
  var http, i, mypost, post_data;
  http = require('http');
  post_data = 'signature=XXX%7CPSFA%7Cxxxxx_value%7CMyclass%7CMysubclass%7CMxxxxx&schedule=schedule_name_6569&company=XXXX';
  mypost = function(post_data, cb) {
    var post_options, req;
    post_options = {
      host: 'myhost.com',
      port: 8000,
      path: '/set_xxxx',
      method: 'POST',
      headers: {
        'Content-Length': post_data.length
      }
    };
    req = http.request(post_options, function(res) {
      var res_data;
      res.setEncoding('utf-8');
      res_data = '';
      res.on('data', function(chunk) {
        return res_data += chunk;
      });
      return res.on('end', function() {
        return cb();
      });
    });
    req.on('error', function(e) {
      return console.debug('TM problem with request: ' + e.message);
    });
    req.write(post_data);
    return req.end;
  };
  for (i = 1; i <= 1000000; i++) {
    mypost(post_data, function() {});
  }
}).call(this);


$ node -v
v0.4.9
$ node sample.js
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory

事先提前

gulden PT

Answer 1

将请求流限制在服务器

中

通过在实例上设置Server属性，可以防止内置maxConnections及其HTTP / HTTPS变体过载。设置此属性将导致节点停止accept()连接并强制操作系统在listen()待办事项已满且应用程序已处理maxConnections请求时删除请求。

限制传出请求

有时，有必要限制传出请求，如问题中的示例脚本。

直接使用节点或使用通用池

正如问题所示，未经检查直接使用节点网络子系统可能会导致内存不足错误。像node-pool之类的东西使活动池管理变得有吸引力，但它并没有解决无约束排队的基本问题。原因是node-pool没有提供有关客户端池状态的任何反馈。

UPDATE ：从v1.0.7开始，node-pool包含一个受此帖子启发的补丁，用于向acquire()添加一个布尔返回值。不再需要以下部分中的代码，带有流模式的示例是带有node-pool的工作代码。

破解抽象

正如Andrey Sidorov所示，可以通过明确跟踪队列大小并使用请求代码混合排队代码来达到解决方案：

var useExplicitThrottling = function () {
  var active = 0
  var remaining = 10
  var queueRequests = function () {
    while(active < 2 && --remaining >= 0) {
      active++;
      pool.acquire(function (err, client) {
        if (err) {
          console.log("Error acquiring from pool")
          if (--active < 2) queueRequests()
          return
        }
        console.log("Handling request with client " + client)
        setTimeout(function () {
          pool.release(client)
          if(--active < 2) {
            queueRequests()
          }
        }, 1000)
      })
    }
  }
  queueRequests(10)
  console.log("Finished!")
}

借用流模式

streams模式是节点中惯用的解决方案。流有一个write操作，当流无法缓冲更多数据时返回false。当获取最大客户端数时，相同的模式可以应用于acquire()返回false的池对象。当活动客户端数量低于最大值时，将发出drain事件。池抽象再次关闭，可以省略对池大小的显式引用。

var useStreams = function () {
  var queueRequests = function (remaining) {
    var full = false
    pool.once('drain', function() {
        if (remaining) queueRequests(remaining)
    })

    while(!full && --remaining >= 0) {
      console.log("Sending request...")
      full = !pool.acquire(function (err, client) {
        if (err) {
          console.log("Error acquiring from pool")
          return
        }
        console.log("Handling request with client " + client)
        setTimeout(pool.release, 1000, client)
      })
    }
  }
  queueRequests(10)
  console.log("Finished!")
}

纤维

可以通过在队列顶部提供阻塞抽象来获得替代解决方案。 fibers模块公开了用C ++实现的coroutines。通过使用光纤，可以在不阻塞节点事件循环的情况下阻止执行上下文。虽然我发现这种方法非常优雅，但在节点社区中却经常被忽略，因为对同步外观的所有事物都有一种奇怪的厌恶。请注意，除了callcc实用程序之外，实际的循环逻辑非常简洁。

/* This is the call-with-current-continuation found in Scheme and other
 * Lisps. It captures the current call context and passes a callback to
 * resume it as an argument to the function. Here, I've modified it to fit
 * JavaScript and node.js paradigms by making it a method on Function
 * objects and using function (err, result) style callbacks.
 */
Function.prototype.callcc = function(context  /* args... */) {
  var that = this,
      caller = Fiber.current,
      fiber = Fiber(function () {
        that.apply(context, Array.prototype.slice.call(arguments, 1).concat(
          function (err, result) {
            if (err)
              caller.throwInto(err)
            else
              caller.run(result)
          }
        ))
      })
  process.nextTick(fiber.run.bind(fiber))
  return Fiber.yield()
}

var useFibers = function () {
  var remaining = 10
  while(--remaining >= 0) {
    console.log("Sending request...")
    try {
      client = pool.acquire.callcc(this)
      console.log("Handling request with client " + client);
      setTimeout(pool.release, 1000, client)
    } catch (x) {
      console.log("Error acquiring from pool")
    }
  }
  console.log("Finished!")
}

结论

有许多正确的方法可以解决这个问题。但是，对于需要在许多上下文中共享单个池的库作者或应用程序，最好正确封装池。这样做有助于防止错误并产生更清晰，更模块化的代码。防止不受约束的排队然后成为一种规则的舞蹈或协同模式。我希望这个答案消除了很多关于阻塞式代码和异步行为的FUD和混淆，并鼓励你编写让你开心的代码。

Answer 2

是的，您尝试在启动它们之前排队1000000个请求。此版本保留有限数量的请求（100）：

  function do_1000000_req( cb )
  {
      num_active = 0;
      num_finished = 0;
      num_sheduled = 0;

      function shedule()
      {
         while (num_active < 100 && num_sheduled < 1000000) {
            num_active++;
            num_sheduled++;
            mypost(function() {
               num_active--;
               num_finished++;
               if (num_finished == 1000000)
               {
                  cb();
                  return;
               } else if (num_sheduled < 1000000)
                  shedule();
           });
         }
      }
  }

  do_1000000_req( function() {
      console.log('done!');
  });

node.js在http.request循环中处理内存不足

3 个答案:

将请求流限制在服务器

限制传出请求

直接使用节点或使用通用池

破解抽象

借用流模式

纤维

结论