Question

我有一个二进制格式的文件：

格式如下：

[4 - header bytes] [8 bytes - int64 - how many bytes to read following] [variable num of bytes (size of the int64) - read the actual information]

然后重复，所以我必须首先读取前12个字节，以确定我需要读取多少字节。

我试过了：

var readStream = fs.createReadStream('/path/to/file.bin');
readStream.on('data', function(chunk) {  ...  })

我遇到的问题是，块一次总是以65536字节的块返回，而我需要更具体地说明我正在读取的字节数。

我一直试过readStream.on('readable', function() { readStream.read(4) }) 但它也不是很灵活，因为它似乎把异步代码变成了同步代码，因为我必须把'读'放在while循环中

或者也许readStream在这种情况下是不合适的，我应该使用它吗？ fs.read(fd, buffer, offset, length, position, callback)

Answer 1

以下是我推荐的readStream的抽象处理程序来处理像你一样描述的抽象数据：

var pending = new Buffer(9999999);
var cursor = 0;
stream.on('data', function(d) {
  d.copy(pending, cursor);
  cursor += d.length;

  var test = attemptToParse(pending.slice(0, cursor));
  while (test !== false) {
    // test is a valid blob of data
    processTheThing(test);

    var rawSize = test.raw.length; // How many bytes of data did the blob actually take up?
    pending.copy(pending.copy, 0, rawSize, cursor); // Copy the data after the valid blob to the beginning of the pending buffer
    cursor -= rawSize;
    test = attemptToParse(pending.slice(0, cursor)); // Is there more than one valid blob of data in this chunk? Keep processing if so
  }
});

对于您的用例，请确保pending缓冲区的初始化大小足以容纳您要解析的最大可能有效数据blob（您提到的是int64;最大大小加上如果blob边界恰好位于流块的边缘，则加上一个额外的65536字节。

我的方法需要一个attemptToParse()方法，它接受缓冲区并尝试解析数据。如果缓冲区的长度太短（数据还没有进入），它应该返回false。如果它是一个有效的对象，它应该返回一些解析的对象，它有一种方法可以显示它占用的原始字节（在我的例子中是.raw属性）。然后，您需要对blob（processTheThing()）执行任何处理，修剪掉有效的blob数据，将待处理的Buffer转换为剩余的并继续运行。这样，你就不会有一个不断增长的pending缓冲区，或者一些＆＃34;完成的＃34;斑点。也许processTheThing()的接收端的进程在内存中保留了一个blob数组，也许它将它们写入数据库，但是在这个例子中，它被抽象掉了所以这个代码只是处理如何处理流数据。

Answer 2

将块添加到Buffer，然后从那里解析数据。意识到不要超出缓冲区的末尾（如果你的数据很大）。我现在正在使用我的平板电脑，因此无法添加任何示例源代码。也许其他人可以吗？

好的，迷你来源，非常骷髅。

var chunks = [];
var bytesRead= 0;

stream.on('data', function(chunk) {

   chunks.push(chunk);
   bytesRead += chunk.length;

   // look at bytesRead...
   var buffer = Buffer.concat(chunks);
   chunks = [buffer];  // trick for next event
      // --> or, if memory is an issue, remove completed data from the beginning of chunks
   // work with the buffer here...

}

NodeJS：如何使用readStream编写文件解析器？

2 个答案: