所以我有一个ISO-8859-1
编码的文本文件存储在一个天蓝色的blob中,我需要使用流逐行处理它,因为文件可能很大。
我确保上传的Blob的contentEncoding正确,并尝试了各种设置,例如ISO-8859-1
,latin1,binary,...
我可以像这样将infile下载到磁盘,并且可以在本地恢复下载文件的编码:
const stream = require('stream');
const storage = require('azure-storage');
const blobService = storage.createBlobService();
const containerName = 'myContainer';
const file = 'in.txt';
let readable = blobService.createReadStream(containerName, file, {encoding: 'latin1'});
let outstream = fs.createWriteStream('./out.txt');
readable.pipe(outstream);
我真正想做的而不是将文件下载到磁盘上,是通过readline
在流中进行工作,并在数据进入时解析文件:
const stream = require('stream');
const storage = require('azure-storage');
const blobService = storage.createBlobService();
const containerName = 'myContainer';
const file = 'in.txt';
let readable = blobService.createReadStream(containerName, file, {encoding: 'latin1'});
readable.setEncoding('latin1');
let rl = readline.createInterface(readable);
rl.on('line', function(line) {
console.log('line: ' + line);
});
rl.on('close', function() {
console.log('Stream closed.');
});
以上代码不起作用,但返回错误:
TypeError: readable.setEncoding is not a function
当我省略setEncoding
时,流将由readline处理,但编码不正确。
我也尝试过azure的getBlobText
(尽管这不是我想要的),它将无法验证contentMD5
并为包含拉丁字符的文件返回错误:
Error: Hash mismatch (integrity check failed), Expected value is jd2+sYDibe5GCw5JwpMhpg==, retrieved q/IlYBBNY6XluFzKKPq7hw==.
at BlobService._validateLengthAndMD5
如何在不将文件下载到磁盘的情况下摆脱困境?