Question

我在Windows上的Node环境中工作。我的代码每秒接收30个Buffer个对象（每个约500-900kb），我需要尽快将这些数据保存到文件系统，而不会阻止接收以下内容的任何工作{ {1}}（即目标是从每个缓冲区保存数据，持续约30-45分钟）。对于它的价值，数据是来自Kinect传感器的连续深度帧。

我的问题是：在Node中编写文件的性能最高的是什么？

这是伪代码：

Buffer

let num = 0 async function writeFile(filename, data) { fs.writeFileSync(filename, data) } // This fires 30 times/sec and runs for 30-45 min dataSender.on('gotData', function(data){ let filename = 'file-' + num++ // Do anything with data here to optimize write? writeFile(filename, data) }似乎比fs.writeFileSync快得多，这就是我上面使用它的原因。但是有没有其他方法来操作数据或写入文件，可以加快每次保存？

Answer 1

首先，您永远不想在处理实时请求时使用fs.writefileSync()，因为这会阻止整个node.js事件循环，直到文件写入完成。

好的，基于将每个数据块写入不同的文件，您希望允许多个磁盘写入同时进行，但不是无限制的磁盘写入。因此，使用队列仍然是合适的，但这次队列一次只有一个正在进行的写入，它同时处理了一些写入：

const EventEmitter = require('events');

class Queue extends EventEmitter {
    constructor(basePath, baseIndex, concurrent = 5) {
        this.q = [];
        this.paused = false;
        this.inFlightCntr = 0;
        this.fileCntr = baseIndex;
        this.maxConcurrent = concurrent;
    }

    // add item to the queue and write (if not already writing)
    add(data) {
        this.q.push(data);
        write();
    }

    // write next block from the queue (if not already writing)
    write() {
        while (!paused && this.q.length && this.inFlightCntr < this.maxConcurrent) {
            this.inFlightCntr++;
            let buf = this.q.shift();
            try {
                fs.writeFile(basePath + this.fileCntr++, buf, err => {
                    this.inFlightCntr--;
                    if (err) {
                        this.err(err);
                    } else {
                        // write more data
                        this.write();
                    }
                });
            } catch(e) {
                this.err(e);
            }
        }
    }

    err(e) {
        this.pause();
        this.emit('error', e)
    }

    pause() {
        this.paused = true;
    }

    resume() {
        this.paused = false;
        this.write();
    }
}

let q = new Queue("file-", 0, 5);

// This fires 30 times/sec and runs for 30-45 min
dataSender.on('gotData', function(data){
    q.add(data);
}

q.on('error', function(e) {
    // go some sort of write error here
    console.log(e);
});

需要考虑的事项：

尝试传递给Queue构造函数的concurrent值。从值5开始。然后看看是否提高该值会给您带来更好或更差的性能。 node.js文件I / O子系统使用线程池来实现异步磁盘写入，因此存在最大数量的并发写入，这样可以使并发数量非常高，可能不会使事情变得更快。
您可以通过在启动node.js应用程序之前设置UV_THREADPOOL_SIZE环境变量来增加磁盘I / O线程池的大小来体验。
这里你最大的朋友是磁盘写入速度。因此，请确保您有一个带有良好磁盘控制器的快速磁盘。快速总线上的快速SSD最好。
如果您可以在多个实际物理磁盘上传播写入，那么可能还会增加写入吞吐量（工作时磁盘磁头更多）。

这是基于对问题的初始解释的先验答案（在编辑之前改变了它）。

因为看起来你需要按顺序进行磁盘写入（所有文件都是同一个文件），所以我建议您使用写入流并让流对象序列化并为您缓存数据，或者您可以像这样自己创建一个队列：

const EventEmitter = require('events');

class Queue extends EventEmitter {
    // takes an already opened file handle
    constructor(fileHandle) {
        this.f = fileHandle;
        this.q = [];
        this.nowWriting = false;
        this.paused = false;
    }

    // add item to the queue and write (if not already writing)
    add(data) {
        this.q.push(data);
        write();
    }

    // write next block from the queue (if not already writing)
    write() {
        if (!nowWriting && !paused && this.q.length) {
            this.nowWriting = true;
            let buf = this.q.shift();
            fs.write(this.f, buf, (err, bytesWritten) => {
                this.nowWriting = false;
                if (err) {
                    this.pause();
                    this.emit('error', err);
                } else {
                    // write next block
                    this.write();
                }
            });
        }
    }

    pause() {
        this.paused = true;
    }

    resume() {
        this.paused = false;
        this.write();
    }
}

// pass an already opened file handle
let q = new Queue(fileHandle);

// This fires 30 times/sec and runs for 30-45 min
dataSender.on('gotData', function(data){
    q.add(data);
}

q.on('error', function(err) {
    // got disk write error here
});

您可以使用writeStream而不是此自定义Queue类，但问题是writeStream可能会填满，然后您必须有一个单独的缓冲区作为放置数据的位置。使用您自己的自定义队列可以同时处理这两个问题。

其他可扩展性/效果评论

因为您似乎是将数据串行写入同一个文件，所以您的磁盘写入不会受益于群集或并行运行多个操作，因为它们基本上必须被序列化。
如果您的node.js服务器除了执行这些写操作之外还有其他事情要做，那么创建第二个node.js进程并执行所有操作可能会有一些优点（必须通过测试验证）在另一个进程中写入磁盘。您的主node.js进程将接收数据，然后将其传递给子进程，该进程将维护队列并进行写入。
您可以尝试的另一件事是合并写入。当队列中有多个项目时，您可以将它们组合成一个写入。如果写入已经很大，这可能没什么区别，但如果写入很小，这可能会产生很大的不同（将大量小磁盘写入组合成一个更大的写入通常更有效）。
< / LI>
这里你最大的朋友是磁盘写入速度。因此，请确保您有一个带有良好磁盘控制器的快速磁盘。快速的SSD将是最好的。

Answer 2

我编写了一个广泛执行此操作的服务，您可以做的最好的事情是将输入数据直接传递给文件（如果您还有输入流）。以这种方式下载文件的简单示例：

const http = require('http')

const ostream = fs.createWriteStream('./output')
http.get('http://nodejs.org/dist/index.json', (res) => {
    res.pipe(ostream)                                                                                                                                                                                              
})
.on('error', (e) => {
    console.error(`Got error: ${e.message}`);
})

因此，在此示例中，没有涉及整个文件的中间复制。当从远程http服务器以块的形式读取文件时，它将被写入磁盘上的文件。从服务器下载整个文件，将其保存在内存中，然后将其写入磁盘上的文件，效率要高得多。

Streams是Node.js中许多操作的基础，所以你也应该研究它们。

根据您的场景，您应该调查的另一件事是UV_THREADPOOL_SIZE，因为I / O操作使用默认情况下设置为4的libuv线程池，如果您进行了大量编写，则可以填写它。

Nodejs：如何优化编写多个文件？

2 个答案: