加速处理较大的二进制文件

时间:2016-11-29 10:46:52

标签: matlab performance file-io bit-manipulation binaryfiles

我必须通过读取它们并创建一个位级数据结构(通常是1x134217728数组)来处理数千个二进制文件(每个16MB),以便在位级处理它们。

目前我这样做的方式如下:

conv = @(c) uint8(bitget(c,1:32));
measurement = NaN(1,(sizeOfMeasurements*8))   %(1,134217728)
fid = fopen(fileName, 'rb');
byteContent = fread(fid,'uint32');
fclose(fid);
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];

因此,我将fopen替换为memmapfile,如下所示:

m = memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
byteContent = m.data.byteContent;
byteContent = double(byteContent);

我为各个指令打印了时间信息(使用tic / toc),结果证明瓶颈是:

bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);  % see first line of code for conv

是否有更有效的方法将byteContent转换为每个索引存储一位的数组(即byteContent的位代表)?

2 个答案:

答案 0 :(得分:5)

循环遍历所有数字由bitget处理。你循环遍历位:

fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);

conv = @(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);

measurement = [bitRepresentation{:}]';
measurement = measurement(:).';

编辑您也可以尝试直接循环:

fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);

sz = 64 * size(bitContent,1);    
measurement3 = zeros(1, sz, 'uint8');
weave = 1:64:sz;
for ii = 1:64
    measurement3(weave + ii - 1) = uint8(bitget(bitContent, ii)); end

但在我的系统上,(令人惊讶地)比arrayfun慢......但是,我的MATLAB版本是从石器时代开始,你的里程可能会有所不同。试一试

答案 1 :(得分:2)

似乎有些事情可以进一步改善Rody的建议:

  1. minor:)使用本地函数代替conv的函数句柄。
  2. major:)使用conv而不是logical~~的结果转换为uint8
  3. (主要:) cell2mat代替[bitRepresentation{:}]'
  4. 结果:

    function q40863898(filename)
    
      fid = fopen(filename, 'rb');
      bitContent = fread(fid,'*ubit64');
      fclose(fid);
    
      bitRepresentation = arrayfun(@convert, 1:64, 'UniformOutput', false);    
      measurement = reshape(cell2mat(bitRepresentation).',[],1).';
    
      function out = convert(ii)
        out = ~~(bitget(bitContent, ii, 'uint64'));
      end
    
    end
    

    基准测试结果(在MATLAB R2016b,Win10 x64,14MB文件中):

    Rody's vectorized method: 0.87783
    Rody's loop method: 2.37
    Dev-iL's method: 0.68387
    

    基准代码:

    function q40863898(filename)
      %% Common code:
      fid = fopen(filename, 'rb');
      bitContent = fread(fid,'*ubit64');
      fclose(fid);
      %% Verification:
      ref = Rody1();
      res = {Rody2(), uint8(Devil1())};  
      assert(isequal(ref,res{1}));
      assert(isequal(ref,res{2}));
      %% Benchmark: 
      disp(['Rody''s vectorized method: ' num2str(timeit(@Rody1))]);
      disp(['Rody''s loop method: ' num2str(timeit(@Rody2))]);
      disp(['Dev-iL''s method: ' num2str(timeit(@Devil1))]);
      %% Functions:
      function measurement = Rody1()
        conv = @(ii) uint8(bitget(bitContent, ii));
        bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
        measurement = [bitRepresentation{:}]';
        measurement = measurement(:).';    
      end
    
      function measurement = Rody2()
        sz = 64 * size(bitContent,1);    
        measurement = zeros(1, sz, 'uint8');
        weave = 1:64:sz;
        for ii = 1:64
            measurement(weave + ii - 1) = uint8(bitget(bitContent, ii));
        end    
      end
    
      function measurement = Devil1()
        bitRepresentation = arrayfun(@convert, 1:64, 'UniformOutput', false);
        measurement = reshape(cell2mat(bitRepresentation).',[],1).';
    
        function out = convert(ii)
          out = ~~(bitget(bitContent, ii, 'uint64'));
        end
      end
    
    end