Question

我正在构建一个允许您暂停/恢复文件上传的文件上传组件。

实现这一目标的标准方法似乎是将文件分解为客户端计算机上的块，然后将块与簿记信息一起发送到服务器，服务器可以将块存储到暂存目录中，然后合并它们当它收到所有的块时，它们在一起。所以，这就是我在做的事情。

我正在使用node / express，我能够很好地获取文件，但我遇到了问题，因为我的merge_chunks函数被多次调用。

这是我的电话堆栈：

router.post('/api/videos', 
    upload.single('file'), 
    validate_params, 
    rename_uploaded_chunk,
    check_completion_status,
    merge_chunks,
    record_upload_date,
    videos.update,
    send_completion_notice
);

check_completion_status函数实现如下：

/* Recursively check to see if we have every chunk of a file */
var check_completion_status = function (req, res, next) {
  var current_chunk = 1;
  var see_if_chunks_exist = function () {
    fs.exists(get_chunk_file_name(current_chunk, req.file_id), function (exists) {
      if (current_chunk > req.total_chunks) { 
        next(); 
      } else if (exists) {
        current_chunk ++;
        see_if_chunks_exist();
      } else { 
        res.sendStatus(202);
      } 
    });
  };
  see_if_chunks_exist();
};

暂存目录中的文件名中嵌入了块号，因此我们的想法是查看每个块号是否有文件。对于给定（完整）文件，该函数应该只next()次。

但是，我的merge_chunks函数被多次调用。（通常在1到4之间）记录确实表明它只在之后被调用我收到了所有的块。

考虑到这一点，我的假设是它导致问题的fs.exists函数的异步性质。

即使n check_completion_status的{{1}}调用可能在我拥有所有块之前发生，但是当我们到n调用{fs.exists()时1}}，x更多的块可能已经到达并同时处理，因此该函数可以继续运行，并且在某些情况下可以到达next()。然而，同时到达的那些块也将对应于check_completion_status的调用，这也是next()的调用，因为我们显然拥有此时的所有文件。

这导致了问题，因为我写merge_chunks时没有考虑到这一点。

为了完整性，请参见merge_chunks函数：

var merge_chunks = (function () {

  var pipe_chunks = function (args) {
    args.chunk_number = args.chunk_number || 1;
    if (args.chunk_number > args.total_chunks) { 
      args.write_stream.end();
      args.next(); 
    } else {
      var file_name = get_chunk_file_name(args.chunk_number, args.file_id)
      var read_stream = fs.createReadStream(file_name);
      read_stream.pipe(args.write_stream, {end: false});
      read_stream.on('end', function () {
        //once we're done with the chunk we can delete it and move on to the next one.
        fs.unlink(file_name);
        args.chunk_number += 1;
        pipe_chunks(args);
      }); 
    }  
  };

  return function (req, res, next) {
    var out = path.resolve('videos', req.video_id);
    var write_stream = fs.createWriteStream(out);
    pipe_chunks({
      write_stream: write_stream,
      file_id: req.file_id,
      total_chunks: req.total_chunks,
      next: next
    });
  };

}());

目前，我收到错误，因为第二次调用函数正在尝试读取第一次调用已删除的块。

处理此类情况的典型模式是什么？如果可能的话，我想避免使用有状态的架构。是否可以在check_completion_status中调用next()之前取消待处理的处理程序？

Answer 1

如果你只想让它尽快工作，我会使用一个锁（很像数据库锁）来锁定资源，这样只有一个请求处理这些块。只需在客户端上创建一个唯一的ID，然后将其与块一起发送。然后将该唯一ID存储在某种数据结构中，并在处理之前查看该id。下面的例子到目前为止还不是最优的（实际上这个地图会继续增长，这很糟糕），但它应该展示这个概念

// Create a map (an array would work too) and keep track of the video ids that were processed. This map will persist through each request.
var processedVideos = {};

var check_completion_status = function (req, res, next) {
  var current_chunk = 1;
  var see_if_chunks_exist = function () {
    fs.exists(get_chunk_file_name(current_chunk, req.file_id), function (exists) {
      if (processedVideos[req.query.uniqueVideoId]){
        res.sendStatus(202);
      } else if (current_chunk > req.total_chunks) { 
        processedVideos[req.query.uniqueVideoId] = true;
        next(); 
      } else if (exists) {
        current_chunk ++;
        see_if_chunks_exist();
      } else { 
        res.sendStatus(202);
      } 
    });
  };
  see_if_chunks_exist();
};

停止多次调用函数

1 个答案: