Question

我使用node来递归遍历文件系统，并使用child.exec对每个文件进行系统调用。它在小型结构上测试时效果很好，有几个文件夹和文件，但是当在整个主目录上运行时，它会在一段时间后崩溃

child_process.js:945
throw errnoException(process._errno, 'spawn');
      ^
Error: spawn Unknown system errno 23
at errnoException (child_process.js:998:11)
at ChildProcess.spawn (child_process.js:945:11)
at exports.spawn (child_process.js:733:9)
at Object.exports.execFile (child_process.js:617:15)
at exports.exec (child_process.js:588:18)

这是否会因为耗尽所有资源而发生？我怎么能避免这个？

编辑：代码改进和最佳实践建议总是欢迎：）

    function processDir(dir, callback) {
        fs.readdir(dir, function (err, files) {
            if (err) {...}
            if (files) {
                async.each(files, function (file, cb) {
                        var filePath = dir + "/" + file;
                        var stats = fs.statSync(filePath);
                        if (stats) {
                            if (stats.isFile()) {
                                processFile(dir, file, function (err) {
                                    if (err) {...}
                                    cb();
                                });
                            } else if (stats.isDirectory()) {
                                processDir(filePath, function (err) {
                                    if (err) {...}
                                    cb();
                                });
                            }
                        }
                    }, function (err) {
                        if (err) {...}
                        callback();
                    }
                );
            }
        });
    }

Answer 1

问题可能是因为同时打开了很多文件

考虑使用异步模块来解决问题 https://github.com/caolan/async#eachLimit

async.eachLimit(
  files,
  20,
  function(file, callback){
    //process file here and call callback
  },
  function(err){
     //done
  }
);

在当前示例中，您将一次处理20个文件

Answer 2

好吧，我不知道失败的原因，但如果这是你所期望的（耗尽所有资源）或其他人说（打开太多文件），你可以尝试使用多任务处理它。 JXcore（Node.JS的分支）提供了这样的东西 - 它允许在单独的实例中运行任务，但这仍然在一个单独的进程内完成。

虽然Node.JS应用程序作为一个进程有其局限性 - JXcore及其子实例会增加这些限制：单个进程即使有一个额外的实例（或任务，或者说井：我们可以称之为子线程）也会超出限制！

所以，让我们说，您将在一个单独的任务中运行每个spawn()。或者，由于任务不再在主线程中运行 - 您甚至可以使用jxcore提供的同步方法：cmdSync()。

代码的这几行可能是最好的例证：

jxcore.tasks.setThreadCount(4);

var task = function(file) {
  var your_cmd = "do something with " + file;
  return jxcore.utils.cmdSync(your_cmd);
};

jxcore.tasks.addTask(task, "file1.txt", function(ret) {
  console.log("the exit code:", ret.exitCode);
  console.log("output:", ret.out);
});

让我再说一遍：任务不会阻塞主线程，因为它在一个单独的实例中运行！

此处记录了多任务API：Multitasking。

Answer 3

正如在注释中所建议的那样，由于您在文件上运行了太多并发操作，因此可能会用尽文件句柄。因此，解决方案是限制一次运行的并发操作数，因此同时没有使用太多文件。

这是一个稍微不同的实现，它使用Bluebird promises来控制操作的异步方面和操作的并发方面。

为了简化并发方面的管理，首先将整个文件列表收集到一个数组中，然后处理文件名数组，而不是随时处理。这使得在Bluebird的.map()（在单个数组上工作）中使用内置并发功能变得更加容易，因此我们不必自己编写代码：

var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));
var path = require("path");

// recurse a directory, call a callback on each file (that returns a promise)
// run a max of numConcurrent callbacks at once
// returns a promise for when all work is done
function processDir(dir, numConcurrent, fileCallback) {
    var allFiles = [];

    function listDir(dir, list) {
        var dirs = [];
        return fs.readdirAsync(dir).map(function(file) {
            var filePath = path.join(dir , file);
            return fs.statAsync(filePath).then(function(stats) {
                if (stats.isFile()) {
                    allFiles.push(filePath);
                } else if (stats.isDirectory()) {
                    return listDir(filePath);
                }
            }).catch(function() {
                // ignore errors on .stat - file could just be gone now
                return;
            });
        });
    }

    return listDir(dir, allFiles).then(function() {
        return Promise.map(allFiles, function(filename) {
            return fileCallback(filename);
        }, {concurrency: numConcurrent});
    });
}

// example usage:

// pass the initial directory, 
// the number of concurrent operations allowed at once
// and a callback function (that returns a promise) to process each file
processDir(process.cwd(), 5, function(file) {
    // put your own code here to process each file
    // this is code to cause each callback to take a random amount of time 
    // for testing purposes
    var rand = Math.floor(Math.random() * 500) + 500;
    return Promise.delay(rand).then(function() {
        console.log(file);
    });
}).catch(function(e) {
    // error here
}).finally(function() {
    console.log("done");
});

仅供参考，我认为你会发现正确的错误传播和许多异步操作的正确错误处理比使用普通回调方法更容易实现承诺。

nodeJS子进程太多了？

3 个答案: