Question

在我的模型中，要完成的最重复的任务之一是计算数组中每个元素的数量。计数来自一个闭集，所以我知道有X种类型的元素，并且全部或部分元素填充数组，以及代表“空”单元格的零。数组没有以任何方式排序，并且可能相当长（大约1M个元素），并且在一次模拟期间（这也是数百次模拟的一部分），该任务完成了数千次。结果应该是大小为r的向量X，因此r(k)是数组中k的数量。

实施例

对于X = 9，如果我有以下输入向量：

v = [0 7 8 3 0 4 4 5 3 4 4 8 3 0 6 8 5 5 0 3]

我想得到这个结果：

r = [0 0 4 4 3 1 1 3 0]

请注意，我不希望计数零，并且数组中未出现的元素（如2）在结果向量的相应位置具有0（ r(2) == 0）。

实现这一目标的最快方式是什么？

Answer 1

tl; dr：最快的方法取决于数组的大小。对于小于2 ¹⁴的数组，下面的方法3（accumarray）更快。对于大于该方法的数组2（histcounts）更好。

更新：我也用implicit broadcasting对此进行了测试，这是在2016b中引入的，结果几乎等于bsxfun方法，这种方法没有显着差异（相对于其他方法））。

让我们看看执行此任务的可用方法有哪些。对于以下示例，我们假设X具有n元素，从1到n，我们感兴趣的数组是M，这是一个可以在尺寸。我们的结果向量为spp ¹，因此spp(k)是k中M的数量。虽然我在这里写了X，但在下面的代码中没有明确的实现，我只是定义n = 500而X隐含1:500。

天真的`for`循环

处理此任务的最简单，最简单的方法是使用for循环遍历X中的元素，并计算M中等于它的元素数：

function spp = loop(M,n)
spp = zeros(n,1);
for k = 1:size(spp,1);
    spp(k) = sum(M==k); 
end
end

这当然不是那么聪明，特别是如果只有X中的一小部分元素填充M，那么我们最好先查看那些已经在M中的元素：< / p>

function spp = uloop(M,n)
u = unique(M); % finds which elements to count
spp = zeros(n,1);
for k = u(u>0).';
    spp(k) = sum(M==k); 
end
end

通常，在MATLAB中，建议尽可能利用内置函数，因为大多数时候它们都要快得多。我想到了5个选项：

1。函数`tabulate`

函数tabulate返回一个非常方便的频率表，乍一看似乎是完成此任务的完美解决方案：

function tab = tabi(M)
tab = tabulate(M);
if tab(1)==0
    tab(1,:) = [];
end
end

要做的唯一解决方法是删除表的第一行（如果它计算0元素（可能是M中没有零）。

2。函数`histcounts`

另一个可以轻松调整的选项histcounts：

function spp = histci(M,n)
spp = histcounts(M,1:n+1);
end

这里，为了分别计算1到n之间的所有不同元素，我们将边缘定义为1:n+1，因此X中的每个元素都拥有它完事。我们也可以写histcounts(M(M>0),'BinMethod','integers')，但我已经测试了它，并且需要更多时间（尽管它使函数独立于n）。

3。函数`accumarray`

我将在此处提供的下一个选项是使用函数accumarray：

function spp = accumi(M)
spp = accumarray(M(M>0),1);
end

此处我们将函数M(M>0)作为输入，跳过零，并使用1作为vals输入来计算所有唯一元素。

4。函数`bsxfun`

我们甚至可以使用二元操作@eq（即==）来查找每种类型的所有元素：

function spp = bsxi(M,n)
spp = bsxfun(@eq,M,1:n);
spp = sum(spp,1);
end

如果我们将第一个输入M和第二个1:n保持在不同的维度中，那么一个是列向量，另一个是行向量，那么函数会比较{{1}中的每个元素} M中的每个元素，并创建一个1:n - by - length(M)逻辑矩阵，而不是我们可以求和以得到所需的结果。

5。函数`ndgrid`

另一个类似于n的选项是使用bsxfun函数显式创建所有可能性的两个矩阵：

ndgrid

然后我们比较它们并对列进行求和，得到最终结果。

基准

我做了一点测试，找到了上面提到的最快的方法，我为所有路径定义了function spp = gridi(M,n) [Mx,nx] = ndgrid(M,1:n); spp = sum(Mx==nx); end。对于某些人（尤其是幼稚的n = 500），for对执行时间有很大的影响，但这不是问题，因为我们想针对给定的n测试它

以下是结果：

我们可以注意到几件事：

有趣的是，最快的方法有所改变。对于小于2的数组¹⁴ n是最快的。对于大于2的数组¹⁴ accumarray是最快的。
正如预期的那样，两个版本中的幼稚histcounts循环都是最慢的，但对于小于2 ⁸的数组，＆＃34; unique＆amp;对于＆＃34;选项较慢。 for成为大于2 ¹¹的数组中最慢的，可能是因为需要在内存中存储非常大的矩阵。
ndgrid对数组大小小于2 ⁹的方式有一些不规则性。在我进行的所有试验中，这个结果是一致的（模式有一些变化）。

（tabulate和bsxfun曲线被截断，因为它会使我的计算机卡在更高的值中，并且趋势已经很明显了）

另外，请注意y轴在log ₁₀中，因此单位减少（对于大小为2的数组¹⁹，在ndgrid之间而accumarray）意味着操作速度提高了10倍。

我很高兴听到评论中有关此测试的改进，如果您有另一种概念上不同的方法，欢迎您将其作为答案。

代码

以下是计时功能中包含的所有功能：

histcounts

以下是运行此代码并生成图表的脚本：

function out = timing_hist(N,n)
M = randi([0 n],N,1);
func_times = {'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid';
    timeit(@() loop(M,n)),...
    timeit(@() uloop(M,n)),...
    timeit(@() tabi(M)),...
    timeit(@() histci(M,n)),...
    timeit(@() accumi(M)),...
    timeit(@() bsxi(M,n)),...
    timeit(@() gridi(M,n))};
out = cell2mat(func_times(2,:));
end

function spp = loop(M,n)
spp = zeros(n,1);
for k = 1:size(spp,1);
    spp(k) = sum(M==k); 
end
end

function spp = uloop(M,n)
u = unique(M);
spp = zeros(n,1);
for k = u(u>0).';
    spp(k) = sum(M==k); 
end
end

function tab = tabi(M)
tab = tabulate(M);
if tab(1)==0
    tab(1,:) = [];
end
end

function spp = histci(M,n)
spp = histcounts(M,1:n+1);
end

function spp = accumi(M)
spp = accumarray(M(M>0),1);
end

function spp = bsxi(M,n)
spp = bsxfun(@eq,M,1:n);
spp = sum(spp,1);
end

function spp = gridi(M,n)
[Mx,nx] = ndgrid(M,1:n);
spp = sum(Mx==nx);
end

¹ _{这个奇怪名字的原因来自我的领域，Ecology。我的模型是细胞自动机，通常模拟虚拟空间中的个体生物（上面的N = 25; % it is not recommended to run this with N>19 for the `bsxfun` and `ndgrid` functions.
func_times = zeros(N,5);
for n = 1:N
func_times(n,:) = timing_hist(2^n,500);
end
% plotting:
hold on
mark = 'xo*^dsp';
for k = 1:size(func_times,2)
plot(1:size(func_times,1),log10(func_times(:,k).*1000),['-' mark(k)],...
'MarkerEdgeColor','k','LineWidth',1.5);
end
hold off
xlabel('Log_2(Array size)','FontSize',16)
ylabel('Log_{10}(Execution time) (ms)','FontSize',16)
legend({'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid'},...
'Location','NorthWest','FontSize',14)
grid on
）。个体属于不同的物种（因此M）并且一起形成所谓的生态群落＆＃34;。＆＃34;州＆＃34;社区的数量来自每个物种的个体数量，这是本答案中的spp向量。在这个模型中，我们首先为要抽取的个体定义一个物种库（上面spp），社区国家考虑物种库中的所有物种，而不仅仅是X中存在的物种。}

Answer 2

我们知道输入向量总是包含整数，所以为什么不用它来＆＃34;挤压＆＃34;从算法中获得更多性能？

我一直在尝试对两种最佳分箱方法suggested by the OP进行一些优化，这就是我提出的方法：

问题中的唯一值（X或示例中的n）的数量应显式转换为（无符号）整数类型。
计算额外的垃圾箱然后丢弃它比计算更快，而且只需处理＆＃34;有效值（请参阅下面的accumi_new函数）。

此功能在我的机器上运行大约需要30秒。我使用的是MATLAB R2016a。

function q38941694
datestr(now)
N = 25;
func_times = zeros(N,4);
for n = 1:N
    func_times(n,:) = timing_hist(2^n,500);
end
% Plotting:
figure('Position',[572 362 758 608]);
hP = plot(1:n,log10(func_times.*1000),'-o','MarkerEdgeColor','k','LineWidth',2);
xlabel('Log_2(Array size)'); ylabel('Log_{10}(Execution time) (ms)')
legend({'histcounts (double)','histcounts (uint)','accumarray (old)',...
  'accumarray (new)'},'FontSize',12,'Location','NorthWest')
grid on; grid minor;
set(hP([2,4]),'Marker','s'); set(gca,'Fontsize',16);
datestr(now)
end

function out = timing_hist(N,n)
% Convert n into an appropriate integer class:
if n < intmax('uint8')
  classname = 'uint8';
  n = uint8(n);
elseif n < intmax('uint16')
  classname = 'uint16';
  n = uint16(n);
elseif n < intmax('uint32')
  classname = 'uint32';
  n = uint32(n);
else % n < intmax('uint64')  
  classname = 'uint64';
  n = uint64(n);
end
% Generate an input:
M = randi([0 n],N,1,classname);
% Time different options:
warning off 'MATLAB:timeit:HighOverhead'
func_times = {'histcounts (double)','histcounts (uint)','accumarray (old)',...
  'accumarray (new)';
    timeit(@() histci(double(M),double(n))),...
    timeit(@() histci(M,n)),...
    timeit(@() accumi(M)),...
    timeit(@() accumi_new(M))
    };
out = cell2mat(func_times(2,:));
end

function spp = histci(M,n)
  spp = histcounts(M,1:n+1);
end

function spp = accumi(M)
  spp = accumarray(M(M>0),1);
end

function spp = accumi_new(M)
  spp = accumarray(M+1,1);
  spp = spp(2:end);
end

计算数组中元素的最快方法是什么？

实施例

2 个答案:

天真的`for`循环

1。函数`tabulate`

2。函数`histcounts`

3。函数`accumarray`

4。函数`bsxfun`

5。函数`ndgrid`

基准

代码

计算数组中元素的最快方法是什么？

实施例

2 个答案:

天真的for循环

1。函数tabulate

2。函数histcounts

3。函数accumarray

4。函数bsxfun

5。函数ndgrid

基准

代码

天真的`for`循环

1。函数`tabulate`

2。函数`histcounts`

3。函数`accumarray`

4。函数`bsxfun`

5。函数`ndgrid`