NodeJS - 从可读流中查看数据事件,而没有来自可写流

时间:2015-05-13 18:15:59

标签: node.js amazon-s3 stream

我们看到一些极高的内存使用情况,我们的一些流在生产中。这些文件存储在S3中,我们在S3对象上打开一个可读的流,然后我们将这些数据传输到我们本地文件系统上的文件中(在我们的EC2实例上)。我们的一些客户拥有非常大的文件。在一个实例中,他们有一个大小超过6GB的文件,并且处理此文件的节点进程使用了​​大量内存,我们几乎耗尽了所有交换空间,并且机器速度变慢了。显然,某些地方存在内存泄漏,这正是我想要追踪的。

与此同时,当我们从流中看到某些事件时,我对代码进行了一些扩充。我有下面的代码和一些带有小测试文件的日志示例输出。让我感到困惑的是,可读流接收暂停事件,然后继续发出数据并暂停事件 WITHOUT 发出排出事件的可写流。我在这里完全遗漏了什么吗?一旦可读流暂停,它如何在接收漏极之前继续发出数据事件?可写流尚未表明它已准备就绪,因此可读流不应发送任何内容......对吗?

然而看看输出。前3个事件对我有意义:数据,暂停,排水。然后接下来的3个很好:数据,数据,暂停。但是,在最终作为第9个事件消失之前,它会发出另一个数据和另一个暂停事件。我不明白为什么事件7和8发生,因为排水直到第9次事件才发生。然后在第9个事件之后再次存在一堆数据/暂停对而没有任何相应的消耗。为什么?我期望的是一些数据事件,然后是暂停,然后 NOTHING 直到发生排水事件 - 此时数据事件可能再次发生。在我看来,一旦发生暂停,在排水事件发生之前不会发生任何数据事件。也许我还是从根本上误解了一些关于Node流的东西?

更新:文档没有提及有关可读流发出的暂停事件的任何内容,但他们确实提到暂停功能可用。据推测,当可写流返回false时会调用此方法,我会假设暂停函数会发出暂停事件。在任何情况下,如果调用pause(),文档似乎与我对世界的看法一致。见https://nodejs.org/docs/v0.10.30/api/stream.html#stream_class_stream_readable

  

此方法会使流式模式的流停止发送数据   的事件即可。任何可用的数据都将保留在内部   缓冲液中。

此测试在我的开发机器(带有Node v0.10.37的Ubuntu 14.04)上运行。我们生产的EC2实例几乎相同。我认为他们现在正在运行v0.10.30。

S3Service.prototype.getFile = function(bucket, key, fileName) {
  var deferred = Q.defer(),
    self = this,
    s3 = self.newS3(),
    fstream = fs.createWriteStream(fileName),
    shortname = _.last(fileName.split('/'));

  logger.debug('Get file from S3 at [%s] and write to [%s]', key, fileName);

  // create a readable stream that will retrieve the file from S3
  var request = s3.getObject({
    Bucket: bucket,
    Key: key
  }).createReadStream();

  // if network request errors out then we need to reject
  request.on('error', function(err) {
      logger.error(err, 'Error encountered on S3 network request');
      deferred.reject(err);
    })
    .on('data', function() {
      logger.info('data event from readable stream for [%s]', shortname);
    })
    .on('pause', function() {
      logger.info('pause event from readable stream for [%s]', shortname);
    });

  // resolve when our writable stream closes, or reject if we get some error
  fstream.on('close', function() {
      logger.info('close event from writable stream for [%s] -- done writing file', shortname);
      deferred.resolve();
    })
    .on('error', function(err) {
      logger.error(err, 'Error encountered writing stream to [%s]', fileName);
      deferred.reject(err);
    })
    .on('drain', function() {
      logger.info('drain event from writable stream for [%s]', shortname);
    });

  // pipe the S3 request stream into a writable file stream
  request.pipe(fstream);

  return deferred.promise;
};

[2015-05-13T17:21:00.427Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.427Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.427Z] INFO: worker/7525 on bdmlinux: drain event from writable stream for [FeedItem.csv] [2015-05-13T17:21:00.507Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.514Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.515Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.515Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.515Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.515Z] INFO: worker/7525 on bdmlinux: drain event from writable stream for [FeedItem.csv] [2015-05-13T17:21:00.595Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.596Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.596Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.596Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.597Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.597Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.597Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.597Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.598Z] INFO: worker/7525 on bdmlinux: drain event from writable stream for [FeedItem.csv] [2015-05-13T17:21:00.601Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.602Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.602Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.602Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.603Z] INFO: worker/7525 on bdmlinux: drain event from writable stream for [FeedItem.csv] [2015-05-13T17:21:00.627Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.627Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.627Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.628Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.628Z] INFO: worker/7525 on bdmlinux: drain event from writable stream for [FeedItem.csv] [2015-05-13T17:21:00.688Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.689Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.689Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.689Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.690Z] INFO: worker/7525 on bdmlinux: data event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.690Z] INFO: worker/7525 on bdmlinux: pause event from readable stream for [FeedItem.csv] [2015-05-13T17:21:00.691Z] INFO: worker/7525 on bdmlinux: close event from writable stream for [FeedItem.csv] -- done writing file

1 个答案:

答案 0 :(得分:1)

你可能有类似量子的东西"观察这种现象会改变结果"这里的情况。节点introduced是v0.10中一种新的流式传输方式。来自docs

  

如果你附加了一个数据事件监听器,那么它会将流切换到流动模式,数据一旦可用就会传递给你的处理程序。

即,附加数据侦听器会将流还原为经典流模式。这可能就是为什么您的行为与您在其他文档中阅读的内容不一致的原因。要以不受干扰的方式观察事物,您可以尝试删除def update(self): # Move left/right self.rect.x += self.change_x cur_pos = self.rect.x - self.level.world_shift print(cur_pos) # test against boundaries: if cur_pos < self.boundary_left or cur_pos > self.boundary_right: self.change_x *= -1 # set animation frame: if self.change_x > 0: frame = (self.rect.x // 30) % len(self.enemy_moving_frames_r) self.image = self.enemy_moving_frames_r[frame] elif self.change_x < 0: frame = (self.rect.x // 30) % len(self.enemy_moving_frames_l) self.image = self.enemy_moving_frames_l[frame] 并使用on('data')之间插入自己的信息流:

through