Question

我需要读取包含数千行的日志文件，并将每行写入Mongo数据库。我正在使用节点流读取文件。我正在使用'split'npm包将文件拆分为'lines'。由于网络考虑，MongoDB写入将比读取日志文件花费更长的时间。

我的核心代码如下：

var readableStream = fs.createReadStream(filename);

            readableStream
                .pipe(split()) // This splits the data into 'lines'
                .on('data', function (chunk) {

                    chunkCount++;
                    slowAsyncFunctionToWriteLogEntryToDatabase(chunk); // This will take ages

                })
                .on('end', function () {
                    // resolve the promise which bounds this process
                    defer.resolve({v:3,chunkCount: chunkCount})

                });

我是否需要担心MongoDB系统会被排队的写入次数所打击？据推测，节点管道背压机制不会知道大量的数据库写入是否正在排队？有没有办法'减慢'可读流，以便它等待每个MongoDB插入在从日志文件中读取下一行之前完成？我不必要地担心吗？

Answer 1

由于使用pause()和resume()似乎有一些问题。我将编写另一个选项，即使用Transform stream。

var Transform = require('stream').Transform;

var myTransform = new Transform({
   transform(chunk, encoding, cb) {
      chunkCount++;

      syncFunctionToWriteLogEntryWithCallback( chunk, function() {
         cb();
      } );
  },

  flush(cb) {
      chunkCount++;
      syncFunctionToWriteLogEntryWithCallback( chunk, function() {
         cb();
      } );
  }
});

readableStream
        .pipe( split() )
        .pipe( myTransform );

使用转换流可以在完成流处理后提供回调。

Answer 2

您可以在可读流中使用pause method来在将块写入mongodb时停止流。

readableStream
            .pipe(split()) // This splits the data into 'lines'
            .on('data', function (chunk) {

                readableStream.pause()

                chunkCount++;

                syncFunctionToWriteLogEntryWithCallback( chunk, function() {
                    readableStream.resume();
                } );

            })
            .on('end', function () {
                // resolve the promise which bounds this process
                defer.resolve({v:3,chunkCount: chunkCount})

            });

我不认为在这种情况下MongoDB会出现严重问题。

Node.js流写入MongoDB - 关注性能

2 个答案: