我有一个结构数组,有三个字段 - 数组,数组的长度和数字。
N = 5;
data = struct;
for i=1:N
n = ceil(rand * 3);
data(i).len = n;
data(i).array = rand(1,n);
data(i).number = i;
end
数据如下所示:
data =
1x5 struct array with fields:
len = [ 1 3 3 1 1 ]
array = [[0.8]; [0.7 0.9 0.4]; [0.7 0 0.3]; [0.1]; [0.3]]
number = [ 1 2 3 4 5 ]
我可以通过多种方式将数组作为1x9数组返回:
>>> [data.array]
>>> cat(2,data.array)
[0.8 | 0.7 0.9 0.4 | 0.7 0 0.3 | 0.1 | 0.3] % | shows array separation
我想重复数字(data.number
)len
次,以生成与连接数组相同的长度数组。
我目前正在使用arrayfun
然后cell2mat
:
>> x = arrayfun(@(x) repmat(x.number, 1, x.len), data, 'UniformOutput', false)
x =
[1] [1x3 double] [1x3 double] [4] [5]
>> cell2mat(x)
[ 1 2 2 2 3 3 3 4 5]
这使得数字与数组对齐。
arrays = [ 0.8 | 0.7 0.9 0.4 | 0.7 0 0.3 | 0.1 | 0.3 ]
numbers = [ 1 | 2 2 2 | 3 3 3 | 4 | 5 ]
这背后的想法是将数据提供给GPU进行处理 - 但重新排列数据的时间比实际处理长几个数量级。
当N = 100,000时, Arrayfun
需要约5秒,而调用repmat
的for循环需要约4秒。
是否有更快的方法将数据从结构中的不均匀数组重新排列为匹配的长度1d数组?我愿意使用不同的数据结构。
测试矢量化方法:
data = struct;
data(1).len = 1;
data(1).array = [1 2 3];
data(1).number = 11;
data(2).len = 0;
data(2).array = [];
data(2).number = 12;
data(3).len = 2;
data(3).array = [4 5 6; 7 8 9];
data(3).number = 13;
list_of_array = cat(1,data.array)
idx = zeros(1,size(list_of_array,1));
% Set start of each array to 1
len = cumsum([data.len])
idx(len) = 1
% Flat indices
idx = cumsum([1 idx(1:end-1)])
nf = [data.number]
repeated_num_faces = nf(idx)
给出输出:
list_of_array =
1 2 3
4 5 6
7 8 9
len =
1 1 3 % Cumulative lengths
idx =
1 0 1 % Ones at start
idx =
1 2 2 % Flat indexes - should be [1 3 3]
nf =
11 12 13 % Numbers expanded
repeated_num_faces =
11 12 12 % Wrong .numbers - should be [11 13 13]
答案 0 :(得分:2)
嗯,struct
并不是最容易处理的问题。当然,你不应该使用repmat
。而不是那样,预先分配data_number
数组并执行for
循环:
tic;
data_array = [data(:).array];
data_number = zeros(size(data_array));
start = 1;
for i=1:N
nel = data(i).len;
data_number(start:start+nel-1) = data(i).number;
start = start+nel;
end
toc;
这是另一个'矢量化'解决方案,使用cumsum
来标记'flat'向量中的索引
tic;
data_array = [data.array];
data_number = zeros(size(data_array));
% cumulative sum of number of elements in every array
len = cumsum([data.len]);
% mark the end of every array in a 'flat' vector
data_number(len) = 1;
% compute 'flat' indices for every data(i).array
data_number = cumsum([1 data_number(1:end-1)]);
% extract the data.number field
data_num = [data.number];
data_number = data_num(data_number);
toc;
对于N=1e5
的数据集,时间是:
Elapsed time is 0.153539 seconds.
Elapsed time is 0.110694 seconds.