我有一个带json数组的大文件(~8GB)。我需要将它拆分成一组小文件,每个文件都包含数组的一部分。
数组只包含对象。
我决定实施这个算法:
我试图自己实现它,但完成了这样的事情:
var fs = require('fs');
readable = fs.createReadStream("walmart.dump", {
encoding: 'utf8',
fd: null,
});
var chunk, buffer = '', counter=0;
readable.on('readable', function() {
readable.read(1);
while (null !== (chunk = readable.read(1))) {
buffer += chunk; // chunk is one symbol
console.log(buffer.length);
if (chunk !== '}') continue;
try {
var res = JSON.parse(buffer);
console.log(res);
readable.read(1);
readable.read(1);
readable.read(1);
//Array.apply(null, {length: 10}).map(function(){return readable.read(1)});
buffer = '{';
} catch(e) { }
}
})
有人解决了类似的问题吗?
答案 0 :(得分:1)
Clarinet module (https://github.com/dscape/clarinet) looks quite promising to me. It's based on sax-js so it should be quite robust and well tested.