Question

我有一个N维1的大单元格。每行都是字符串或双精度。字符串是变量名称，顺序双精度数是它的值，直到下一个字符串（另一个变量名称）。例如：

data = {
var_name1;
val1;
val2;
val3;
val4;
val5;
var_name2;
val1;
val2;
var_name3;
val1;
val2;
val3;
val4;
val5;
val6;
val7}

等等。我想将数据单元分成三个单元; {var_name和它的5个值}，{var_name和它的2个值}，{var_name和它的7个值}。我尽量不尽可能地循环，并发现矢量化与cellfun一起工作得非常好。可能吗？数据单元有近百万行。

Answer 1

cellfun用于将函数应用于单元格的每个元素。

当您将多个参数传递给cellfun时，它会采用i，data和indx_first的{{1}}个参数，并使用每个参数他们在匿名函数中。将这些变量替换为indx_last，x(y : z)中的每个元素都x。换句话说，您正在执行data，即索引单元格数组的实际元素，而不是索引单元格数组本身。我不认为这就是你想要的。对于data{i}(y : z)和data{y : z}中相应元素给出的每个(y, z)对，你真的想要indx_first吗？

如果确实如此，我没有看到解决问题的矢量化方法，因为每个＆＃34;变量＆＃34;有不同的大小。但是你确实知道你有多少变量，大小是indx_last。所以我预先分配然后循环，如下所示：

indx_first

最后，您将拥有一个包含2列的单元格数组。每行中的第一列是变量的名称。第二个是实际数据。即。

>> vars = cell(length(indx_first), 2);
>> for i = 1:length(vars)
   vars{i, 1} = data{indx_first(i) - 1}; % store variable name in first column
   vars{i, 2} = [data{indx_first(i) : indx_last(i)}]; % store data in last column
   end

Answer 2

我相信以下内容应该是你所追求的。主要部分是使用cumsum计算每行对应的名称，然后accumarray为每个名称建立列表。

% Make some data
data = {'a'; 1; 2; 3;
    'b'; 4; 5;
    'c'; 6; 7; 8; 9;
    'd';
    'e'; 10; 11; 12};

% Which elements are the names?
isName = cellfun(@ischar, data);

% Use CUMSUM to work out for each row, which name it corresponds to
whichName = cumsum(isName);

% Pick out only the values from 'data', and filter 'whichName'
% for just the values
justVals = data(~isName);
whichName = whichName(~isName);

% Use ACCUMARRAY to build up lists per name. Note that the function
% used by ACCUMARRAY must return something scalar from a column of
% values, so we return a scalar cell containing a row-vector
% of those values
listPerName = accumarray(whichName, cell2mat(justVals), [], @(x) {x.'});

% All that remains is to prepend the name to each cell. This ends
% up with each row of output being a cell like {'a', [1 2 3]}.
% It's simple to make the output be {'a', 1, 2, 3} by adding
% a call to NUM2CELL on 'v' in the anonymous function.
nameAndVals = cellfun(@(n, v) [{n}, v], data(isName), listPerName, ...
    'UniformOutput', false);

具有两个索引数组的cellfun

2 个答案: