Question

如果必须以递增方式将数据附加到数组，似乎使用基本数据类型的单个向量比结构数组（每个记录一个向量元素）快几个数量级。即使尝试将单个向量收集到结构中似乎也会使时间加倍。测试是：

N=5e4;

fprintf('\nstruct array (array of structs):\n')
clear x y;
y=struct( 'a',[], 'b',[], 'c',[], 'd',[] );
tic
for iIns = 1 : N
   x.a=rand; x.b=rand; x.c=rand; x.d=rand;
   y(end+1)=x;
end % for iIns
toc

fprintf('\nSeparate arrays of scalars:\n')
clear a b c d;
a=[]; b=[]; c=[]; d=[];
tic
for iIns = 1 : N
   a(end+1) = rand;
   b(end+1) = rand;
   c(end+1) = rand;
   d(end+1) = rand;
end % for iIns
toc

fprintf('\nA struct with arrays of scalars for fields:\n')
clear a b c d x y
x.a=[]; x.b=[]; x.c=[]; x.d=[];
tic
for iIns = 1:N
   x.a(end+1)=rand;
   x.b(end+1)=rand;
   x.c(end+1)=rand;
   x.d(end+1)=rand;
end % for iIns
toc

结果：

struct array (array of structs):
Elapsed time is 24.127274 seconds.

Separate arrays of scalars:
Elapsed time is 0.048190 seconds.

A struct with arrays of scalars for fields:
Elapsed time is 0.084624 seconds.

即使将基本数据类型的单个向量收集到结构中（上面的第3个场景）也会产生这样的惩罚，但是简单地使用单个向量（上面的第二个场景）可能更为可取，因为变量更有条理。您的变量名称空间不会填充这么多变量，这些变量实际上是在概念上分组的。

然而，为这样的组织付出了相当大的代价。我不认为有办法避免这种情况吗？

Answer 1

有两种方法可以避免这种性能损失：（1）预先分配，以及（2）重新思考你对“组织”变量的立场。我建议两者。哦，如果可以的话，不要使用每个字段只使用标量的结构数组 - 如果你的应用程序突然需要处理几个数量级的数据，那么内存开销会迫使你重写所有内容。

<强>预分配

您经常知道阵列最终会有多少元素。因此，将数组初始化为s = struct('a',NaN(1:N),'b',NaN(1:N));如果您事先不知道将有多少条目，但您可以估计上限，使用上限初始化，并删除元素或使用函数（例如nanmean）不关心数组最后是否有一些额外的NaN。如果你真的对最终大小一无所知（除了N足够重要），用一个好的数字预先分配（例如N=1337），并以数据块的形式扩展数组。 MathWorks在最近的版本中加快了数字数组的动态增长，但正如您在答案中所展示的那样，优化尚未应用于结构。不要指望MathWorks的优化团队来修复您的代码。

好的变量

为什么担心你的变量空间？只要您使用explicitVariableNames，您的代码仍然可读，您可以轻松选择正确的变量。但是好吧，假设你要清理：保持活动变量数量低的第一种方法是在代码中的关键点使用clear或keep，以确保只保留什么是需要。第二个（假设您要优化性能）是将上下文链接的向量放入同一个数组中：objectDimensions = [lengthOfObject, widthOfObject, heightOfObject]。这样可以将所有内容保存为数字数组（速度最快），并允许轻松进行矢量化，例如objectVolume = prod(objectDimensions,2);。

/旁边：我应该透露，我过去经常使用结构来组装结果（这样我就可以将大量信息返回给单个变量并将字段名称作为文档的一部分）。我已经切换到使用面向对象编程（通常是handle - 对象），它不仅收集相关变量，还收集相关功能，并且便于代码重用。我确实受到了性能影响，但是它节省了我编码的时间远远超过它。请注意，如果可能的话，我会预先分配（如果它不仅仅是增长一次数组三次）。

示例

假设您有一个函数getDimensions，它可以读取对象的尺寸（长度，高度，宽度）。但是，有时候，对象是2D，有时候是3D。因此，您需要填充以下变量：twoD.length，twoD.width，threeD.length，threeD.width，threeD.height，理想情况下为结构数组，以便结构的每个元素对应一个对象。你事先并不知道有多少个对象，你所能做的只是轮询函数thereAreMoreObjects，它返回true或false，直到没有更多的对象为止。

以下是如何以合理的效率和不断增长的阵列来实现这一目标：

%// preassign the temporary variable, and some others chunkSize = 1000; numObjects = 0; idAndDimensions = zeros(chunkSize,4); while thereAreMoreObjects() objectId = getCurrentObjectId(); %// hi==-1 if it's flat [len,wid,hi] = getObjectDimensions(objectId); %// allocate more, if needed numObjects = numObjects + 1; if numObjects > size(idAndDimensions,1) %// grow array idAndDimensions(end+chunkSize,1) = 0; end idAndDimensions(numObjects,:) = [objectId, len, wid, hi]; end %// throw away excess idAndDimensions = idAndDimensions(1:numObjects,:); %// split into 2D and 3D objects isTwoD = numObjects(:,end) == -1; %// assign twoD struct twoD = struct('id',num2cell(idAndDimensions(isTwoD,1),... 'length',num2cell(idAndDimensions(isTwoD,2),... 'width',num2cell(idAndDimensions(isTwoD,3)); %// assign threeD struct %// clean up - we need only the two structs %// I use keep from the File Exchange instead of clearvars clearvars -except twoD threeD

增量附加：如何避免结构数组的性能损失

1 个答案: