Question

我通过重写一些我在C＃中使用的实用工具来学习Node.js，以获得乐趣。我要么发现在Node.js中编写一些不太好的想法，要么我完全错过了一个可以使它工作的概念。

程序的目标：在文件目录中搜索具有与某些条件匹配的数据的文件。这些文件是gzip压缩的XML，目前我只是在寻找一个标签。这是我尝试过的（files是一个文件名数组）：

while (files.length > 0) {
    var currentPath = rootDir + "\\" + files.pop();
    var fileContents = fs.readFileSync(currentPath);
    zlib.gunzip(fileContents, function(err, buff) {
        if (buff.toString().indexOf("position") !== -1) {
            console.log("The file '%s' has an odometer reading.", currentPath);
            return;
        }
    });     

    if (files.length % 1000 === 0) {
        console.log("%d files remain...", files.length);
    }
}

当我写这篇文章时，我对此很紧张。从控制台输出可以清楚地看到，所有的gunzip操作都是异步的，并且决定等到while循环完成。这意味着当我最终获得一些输出时，currentPath没有读取文件时的值，因此该程序是无用的。我没有看到使用zlip模块解压缩数据的同步方法。我没有看到存储上下文的方法（currentPath会这样做），因此回调具有正确的值。我最初尝试使用流，将文件流传递给gunzip流，但是我遇到了类似的问题，因为我的所有回调都发生在循环完成后我丢失了有用的上下文。

这是漫长的一天，我不知道如何构建这个。循环是一个同步的东西，我的异步东西取决于它的状态。那很不好。我错过了什么？如果文件没有被gzip压缩，那么因为readFileSync（）会很容易。

Answer 1

哇。我真的没想到没有答案。我陷入了时间紧迫，但我花了最后几天查看Node.js，假设为什么某些事情像他们一样工作，并了解控制流程。

所以代码as-is不起作用，因为我需要一个闭包来捕获currentPath的值。男孩做Node.js就像关闭和回调。因此，应用程序的更好结构将如下所示：

function checkFile(currentPath, fileContents) {
    var fileContents = fs.readFileSync(currentPath);
    zlib.gunzip(fileContents, function(err, buff) {
        if (buff.toString().indexOf("position") !== -1) {
            console.log("The file '%s' has an odometer reading.", currentPath);
            return;
        }
    });
}

while (files.length > 0) {
    var currentPath = rootDir + "\\" + files.shift();
    checkFile(currentPath);

}

但事实证明，这不是Node，因为有很多同步代码。要异步执行，我需要依靠更多的回调。该程序的结果比我预期的要长，所以为了简洁，我只会发布部分内容，但它的第一部分看起来像这样：

function checkForOdometer(currentPath, callback) {
    fs.readFile(currentPath, function(err, data) {
        unzipFile(data, function(hasReading) {
            callback(currentPath, hasReading);
        });
    });
}

function scheduleCheck(filePath, callback) {
    process.nextTick(function() {
        checkForOdometer(filePath, callback);
    });
}

var withReading = 0;
var totalFiles = 0;
function series(nextPath) {
    if (nextPath) {
        var fullPath = rootDir + nextPath;
        totalFiles++;
        scheduleCheck(fullPath, function(currentPath, hasReading) {
            if (hasReading) {
                withReading++;
                console.log("%s has a reading.", currentPath);
            }

            series(files.shift());
        });
    } else {
        console.log("%d files searched.", totalFiles);
        console.log("%d had a reading.", withReading);
    }
}

series(files.shift());

系列控制流程的原因似乎是我设置了明显的并行搜索，我最终耗尽了进程内存，可能是因为堆栈中有60,000多个缓冲区数据：

while (files.length > 0) {
    var currentPath = rootDir + files.shift();
    checkForOdometer(currentPath, function(callbackPath, hasReading) {
        //...
    });
}

我可能会将其设置为并行安排50个文件批量，等待完成后再安排50个文件。设置系列控制流程似乎同样容易。

如何在Node.js中找到包含特定数据的文件之前循环？

1 个答案: