插入

Question

我有一个看起来像这样的大型json文件：

[
 {"name": "item1"},
 {"name": "item2"},
 {"name": "item3"}
]

我想流式传输这个文件（到目前为止非常简单），因为每一行都在解析/拒绝调用时运行一个异步函数（返回一个promise）。

输入文件的结果可能是：

[
 {"name": "item1", "response": 200},
 {"name": "item2", "response": 404},
 {"name": "item3"} // not processed yet
]

我不想创建另一个文件，我想动态编辑相同的文件（如果可能的话！）。

谢谢：）

Answer 1

根据this answer写入同一文件，而阅读不可靠。作为一个评论者说，最好写一个临时文件，然后删除原文并重命名临时文件。

要创建线条流，您可以使用byline。然后对于每一行，应用一些操作并将其输出到输出文件。

这样的事情：

var fs = require('fs');
var stream = require('stream');
var util = require('util');
var LineStream = require('byline').LineStream;

function Modify(options) {
    stream.Transform.call(this, options);
}
util.inherits(Modify, stream.Transform);

Modify.prototype._transform = function(chunk, encoding, done) {
    var self = this;
    setTimeout(function() {
        // your modifications here, note that the exact regex depends on 
        // your json format and is probably the most brittle part of this
        var modifiedChunk = chunk.toString();
        if (modifiedChunk.search('response:[^,}]+') === -1) {
            modifiedChunk = modifiedChunk
                .replace('}', ', response: ' + new Date().getTime() + '}') + '\n';
        }      
        self.push(modifiedChunk);
        done();
    }, Math.random() * 2000 + 1000); // to simulate an async modification
};

var inPath = './data.json';
var outPath = './out.txt';
fs.createReadStream(inPath)
    .pipe(new LineStream())
    .pipe(new Modify())
    .pipe(fs.createWriteStream(outPath))
    .on('close', function() {
        // replace input with output
        fs.unlink(inPath, function() {
           fs.rename(outPath, inPath);
        });
    });

请注意，上述结果一次只发生一次异步操作。您还可以将修改保存到数组中，一旦完成所有修改，就将数组中的行写入文件，如下所示：

var fs = require('fs');
var stream = require('stream');
var LineStream = require('byline').LineStream;

var modifiedLines = [];
var modifiedCount = 0;
var inPath = './data.json';
var allModified = new Promise(function(resolve, reject) {

    fs.createReadStream(inPath).pipe(new LineStream()).on('data', function(chunk) {
       modifiedLines.length++;
       var index = modifiedLines.length - 1;
       setTimeout(function() {
           // your modifications here
           var modifiedChunk = chunk.toString();
           if (modifiedChunk.search('response:[^,}]+') === -1) {
               modifiedChunk = modifiedChunk
                   .replace('}', ', response: ' + new Date().getTime() + '}');
           }                      
           modifiedLines[index] = modifiedChunk;
           modifiedCount++;
           if (modifiedCount === modifiedLines.length) {
              resolve();
           }
       }, Math.random() * 2000 + 1000);
    });

}).then(function() {
    fs.writeFile(inPath, modifiedLines.join('\n'));
}).catch(function(reason) {
    console.error(reason);
});

如果您希望流式传输有效json的块而不是行，这将是一种更强大的方法，请查看JSONStream。

Answer 2

我没有真正回答这个问题，但不要以为无论如何都能以满意的方式回答，所以这是我的2美分。

我假设您知道如何逐行流式传输并运行该功能，而您唯一的问题是编辑您正在阅读的文件。

插入

的后果

无法将插入数据原生地插入到任何文件中（这是您希望通过更改JSON实时操作）。文件只能在最后成长。

因此，在1GB文件的开头插入10个字节的数据意味着您需要向磁盘写入1GB（以便将所有数据进一步移动10个字节）。

你的文件系统不理解JSON，只是看到你在一个大文件的中间插入字节，所以这将是非常慢。

所以，是的，这是可能的。使用insert()方法在NodeJS中的文件API上写一个包装器。

然后编写一些代码，以便能够知道将字节插入JSON文件的位置，而无需加载整个文件，也不会在最后生成无效的JSON。

现在我不推荐它：）

=＆GT;阅读此问题：Is it possible to prepend data to an file without rewriting?

为什么呢？

我认为要么

可以随时终止您的流程，并通过再次阅读文件轻松恢复工作。
重试部分处理的文件以仅填充缺失的位。

第一个解决方案：使用数据库

提取在随机位置实时编辑文件需要完成的工作是存在数据库的唯一目的。

它们的存在只是为了抽象UPDATE mytable SET name = 'a_longer_name_that_the_name_that_was_there_before' where name = 'short_name'背后的魔力。

看看LevelUP/Down，sqlite等......

他们将抽象出你需要在JSON文件中完成的所有魔法！

第二种解决方案：使用多个文件

当您流式传输文件时，请写两个新文件！

包含输入文件中当前位置和需要重试的行的文件
另一个预期结果。

您也可以随时终止您的流程并重新启动

Answer 3

如评论中所述，您拥有的文件不是正确的JSON，尽管在Javascript中有效。为了生成适当的JSON，可以使用1。我认为这会让其他人难以解析非标准JSON，因此我建议提供一个新的输出文件，而不是保留原始文件。

但是，仍然可以将原始文件解析为JSON。这可以通过JSON.stringify()实现，但是将外部数据带入node.js是不安全的。

eval('(' + procline + ')');

输出如下：

const fs = require('fs');
const readline = require('readline');
const fr = fs.createReadStream('file1');
const rl = readline.createInterface({
    input: fr
});


rl.on('line', function (line) {
    if (line.match(new RegExp("\{name"))) {
        var procline = "";
        if (line.trim().split('').pop() === ','){
            procline = line.trim().substring(0,line.trim().length-1);
        }
        else{
            procline = line.trim();
        }
        var lineObj = eval('(' + procline + ')');
        lineObj.response = 200;
        console.log(JSON.stringify(lineObj));
    }
});

哪个是line-delimited JSON（LDJSON），可以用于流媒体内容，而无需前导和尾随{"name":"item1","response":200} {"name":"item2","response":200} {"name":"item3","response":200}，[或]。它也有一个ldjson-stream包。

NodeJS流解析并在Promise结果上将json行写入行

3 个答案:

插入

为什么呢？

第一个解决方案：使用数据库

第二种解决方案：使用多个文件