Question

我有（非常大）逗号分隔的文件，以bz2格式压缩。如果我解压缩它们并且我用

读取

    fileID = fopen('file.dat');
    X = textscan(fileID,'%d %d64 %s %f %d %f %f %d', 'delimiter', ',');
    fclose(fileID);

一切都很好。但是我想在没有解压缩它们的情况下阅读它们，比如

    fileID = fopen('file.bz2');
    X = textscan(fileID,'%d %d64 %s %f %d %f %f %d', 'delimiter', ',');
    fclose(fileID);

其中，不幸的是，它返回一个空的X.有什么建议吗？我是否必须通过系统（'...'）命令不可避免地解压缩它们？

Answer 1

您可以尝试使用带有字符串而不是流的textscan形式。使用Matlab Java集成，您可以利用Java链式流动态解压缩并读取单行，然后可以对其进行解析：

% Build a stream chain that reads, decompresses and decodes the file into lines
fileStr = javaObject('java.io.FileInputStream', 'file.dat.gz');
inflatedStr = javaObject('java.util.zip.GZIPInputStream', fileStr);
charStr = javaObject('java.io.InputStreamReader', inflatedStr);
lines = javaObject('java.io.BufferedReader', charStr);

% If you know the size in advance you can preallocate the arrays instead
% of just stating the types to allow vcat to succeed
X = { int32([]), int64([]), {}, [], int32([]), [], [], int32([]) };
curL = lines.readLine();
while ischar(curL) % on EOF, readLine returns null, which becomes [] (type double)
    % Parse a single line from the file
    curX = textscan(curL,'%d %d64 %s %f %d %f %f %d', 'delimiter', ',');
    % Append new line results 
    for iCol=1:length(X)
        X{iCol}(end+1) = curX{iCol};
    end
    curL = lines.readLine();
end
lines.close(); % Don't forget this or the file will remain open!

我并没有完全保证这个方法的性能，所有的数组都附加了，但至少你可以在Matlab / Octave中动态读取GZ文件。也：

如果你有一个解压缩另一种格式的Java流类（例如Apache Commons Compress），你可以用同样的方式阅读它。你可以阅读bzip2或xz文件。
还有一些类可以访问 archives ，例如基本Java发行版中的zip文件，或者Apache Commons Compress中的tar / RAR / 7z等。这些类通常有一些方法可以查找存档中存储的文件，允许您在存档中打开输入流并以与上面相同的方式读取。

Answer 2

在unix系统上，我会使用命名管道并执行以下操作：

for m = 1:size(phi,1) - (constant)/2    
  phi(m) = phi(m).*(mean(conj(phi(1+m:(constant)/2+m))));    
end

读取压缩文件而不在Matlab中解压缩它们

2 个答案: