MATLAB中的大数据阵列操作

时间:2013-03-16 00:24:49

标签: arrays matlab large-data

我在数组< 1x43 cell>中有一个大数据集。数据大小非常大,这些是一些单元尺寸 - 5是< 1x327680 double>,11是< 1x1376256 double>

我正在尝试执行我有功能的重采样操作。 (功能代码如下所示)。我试图从数组中取出一个完整的单元格,执行Resample操作并将结果存储回相同的数组位置或不同的数组位置。

但是,我在第19行或Resample函数中得到以下错误 -

“使用零错误 超出了程序允许的最大可变大小。 重新采样出错(第19行)     obj =零(t,1);

当我评论第19行时,我遇到内存不足错误。

请问有更有效的方法来处理这么大的数据集吗?

谢谢。

实际代码:

%% To load each ".dat" file for the 51 attributes to an array.

a = dir('*.dat');

for i = 1:length(a)
eval(['load ' a(i).name ' -ascii']);
end

attributes = length(a);

% Scan folder for number of ".dat" files
datfiles = dir('*.dat'); 

% Count Number of ".dat" files
numfiles = length(datfiles); 

% Read files in to MATLAB
for i = 1:1:numfiles
    A{i} = csvread(datfiles(i).name);
end

% Remove discarded variables
ind = [1 22 23 24 25 26 27 32]; % Variables to be removed.
A(ind) = [];

% Reshape all the data into columns - (n x 1) 
for i = 1:1:length(A)
    temp = A{1,i};
    [x,y] = size(temp);
    if x == 1 && y ~= 1
        temp = temp';
        A{1,i} = temp;
    end
end

% Retrieves the frequency data for the attributes from Excel spreadsheet
frequency = xlsread('C:\Users\aajwgc\Documents\MATLAB\Research Work\Data\testBig\frequency');

% Removing recorded frequency for discarded variables
frequency(ind) = [];

% Upsampling all the attributes to desired frequency
prompt = {'Frequency (Hz):'};
dlg_title = 'Enter desired output frequency for all attributes';
num_lines = 1;
def = {'50'};
answer= inputdlg(prompt,dlg_title,num_lines,def);
OutFreq = str2num(answer{1});

m = 1; 
n = length(frequency);
A_resampled = cell(m,n);
A_resampled(:) = {''};

for i = length(frequency);
    raw = cell2mat(A(1,i));
    temp= Resample(raw, frequency(i,:), OutFreq);
     A_resampled{i} = temp(i);
end

重新取样功能:

function obj = Resample(InputData, InFreq, OutFreq, varargin)
%% Preliminary setup
% Allow for selective down-sizing by specifying type
type = 'mean'; %default to the mean/average

if size(varargin,2) > 0
    type = varargin{1};
end

% Determine the necessary resampling factor
factor = OutFreq / InFreq;

%% No refactoring required
if (factor == 1)
    obj = InputData;
%% Up-Sampling required
elseif (factor > 1)
    t = factor * numel(InputData(1:end));
    **obj = zeros(t,1); ----------------> Line 19 where I get the error message.**

    for i = 1:factor:t
        y = ((i-1) / factor) + 1;
        z = InputData(y);
        obj(i:i+factor) = z;
    end
%% Down-Sampling required
elseif (factor < 1)    
    t = numel(InputData(1:end));
    t = floor(t * factor);
    obj = zeros(t,1);
    factor = int32(1/factor);

    if  strcmp(type,'mean') %default is mean (process first)
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = mean(InputData(y:y+factor-1));
        end    
    elseif strcmp(type,'min')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = min(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'max')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = max(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'mode')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = mode(InputData(y:y+factor-1));
        end 
    elseif strcmp(type,'sum')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = sum(InputData(y:y+factor-1));
        end   
    elseif strcmp(type,'single')
        for i = 1:t
            y = (factor * (i-1)) + 1;
            obj(i) = InputData(y);
        end
    else
        obj = NaN;
    end
else
    obj = NaN;
end

1 个答案:

答案 0 :(得分:0)

如果你有DSP系统工具箱,你可以使用例如dsp.FIRInterpolator系统对象(http://www.mathworks.co.uk/help/dsp/ref/dsp.firinterpolatorclass.html)并重复调用它的step()函数,以避免一次性处理所有数据。

顺便说一下,上/下采样(插值和抽取)是比你想象的更复杂的概念;从最普遍的意义上讲,它们都需要某种形式的过滤来去除这些过程产生的伪像。

您可以自己设计这些滤波器并将信号与它们进行卷积,但这种滤波器设计需要在信号处理方面有坚实的基础。如果你想走这条路,我建议在没有参考文本的情况下从某个地方拿起一本容易出错的教科书。