替代方案 - 使用`for`循环

Question

我有一个包含元素A的值向量i，例如：

A = [0.1 0.2 0.3 0.4 0.5]; 并说r = [5 2 3 2 1];

现在，我想创建一个新的向量Anew，其中包含r(i)中i个值A的{{1}}次重复项，以便r(1)=5中的第一个Anew项{1}}的值为A(1)，新向量的长度为sum(r)。因此：

Anew = [0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.5]

我相信这可以通过精心设计的for - 循环组合完成，例如repmat，但是有人知道如何以更顺畅的方式做到这一点吗？

Answer 1

据我所知，在MATLAB中没有相同的功能可以做到这一点，尽管R有rep可以为你做到这一点....所以嫉妒。

无论如何，我建议的唯一方法是按照您的建议使用for运行repmat循环。但是，你可以改为arrayfun，如果你想把它做成一个单行...那么从技术上来说，做两次后处理需要将它变成一个向量。因此，您可以尝试这样做：

Anew = arrayfun(@(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
Anew = vertcat(Anew{:});

这基本上用更少的代码执行复制向量的for循环和连接。我们遍历A和r中的每对值并吐出复制的向量。它们中的每一个都在一个单元格数组中，这就是为什么vertcat需要将它全部放入一个向量中的原因。

我们得到：

请注意，其他人尝试过类似于您在此帖中所做的事情：A similar function to R's rep in Matlab。这基本上模仿R做rep的方式，这就是你想要做的！

替代方案 - 使用`for`循环

由于@Divakar的基准测试，我很想知道如何预先分配数组，然后使用实际的for循环来迭代A和r并填充它索引将成为基准。因此，使用for循环和索引的上述等效代码将是：

Anew = zeros(sum(r), 1);
counter = 1;
for idx = 1 : numel(r)
    Anew(counter : counter + r(idx) - 1) = A(idx);
    counter = counter + r(idx);
end

我们需要一个变量来跟踪我们需要在数组中插入元素的位置，该元素存储在counter中。我们通过每个数字复制的元素总数来抵消这一点，该数量存储在r的每个值中。

因此，此方法完全避免使用repmat，而只是使用索引来生成复制向量。

基准测试（àDivakar）

在Divakar的基准测试代码之上，我实际上尝试在我的机器上运行所有测试，以及for循环方法。我只是将他的基准测试代码用于相同的测试用例。

这些是我根据算法得到的时间结果：

案例＃1 - `N = 4000`，`max_repeat = 4000`

-------------------  With arrayfun
Elapsed time is 1.202805 seconds.
-------------------  With cumsum
Elapsed time is 1.691591 seconds.
-------------------  With bsxfun
Elapsed time is 0.835201 seconds.
-------------------  With for loop
Elapsed time is 0.136628 seconds.

案例＃2 - `N = 10000`，`max_repeat = 1000`

-------------------  With arrayfun
Elapsed time is 2.117631 seconds.
-------------------  With cumsum
Elapsed time is 1.080247 seconds.
-------------------  With bsxfun
Elapsed time is 0.540892 seconds.
-------------------  With for loop
Elapsed time is 0.127728 seconds.

在这些情况下，cumsum实际上击败了arrayfun ......这正是我原先的预期。除bsxfun循环外，for击败其他所有人。我的猜测是我和Divakar之间arrayfun的不同时间，我们在不同的架构上运行我们的代码。我目前正在Mac OS X 10.9.5 MacBook Pro机器上使用MATLAB R2013a进行测试。

正如我们所看到的，for循环要快得多。我知道在for循环中进行索引操作时，JIT会启动并为您提供更好的性能。

Answer 2

首先考虑形成索引向量[1 1 1 1 1 2 2 3 3 3 4 4 5]。注意到这里的常规增量让我想到了cumsum：我们可以通过将零放在零矢量中的正确位置来获得这些步骤：[1 0 0 0 0 1 0 1 0 0 1 0 1]。我们可以通过在输入列表上运行另一个cumsum来获得。在调整了最终条件和基于1的索引之后，我们得到了这个：

B(cumsum(r) + 1) = 1;
idx = cumsum(B) + 1;
idx(end) = [];
A(idx)

Answer 3

基于{p> bsxfun的方法 -

A = [0.1 0.2 0.3 0.4 0.5]
r = [5 2 3 2 1]

repeats = bsxfun(@le,[1:max(r)]',r) %//' logical 2D array with ones in each column 
                                    %// same as the repeats for each entry
A1 = A(ones(1,max(r)),:) %// 2D matrix of all entries repeated maximum r times
                         %// and this resembles your repmat 
out = A1(repeats) %// desired output with repeated entries

它本质上可以成为一个双线 -

A1 = A(ones(1,max(r)),:);
out = A1(bsxfun(@le,[1:max(r)]',r));

输出 -

基准

到目前为止，此处提供的解决方案可能会产生一些基准测试结果。

基准代码 - 案例I

%// Parameters and input data
N = 4000;
max_repeat = 4000;
A = rand(1,N);
r = randi(max_repeat,1,N);
num_runs = 10; %// no. of times each solution is repeated for better benchmarking

disp('-------------------  With arrayfun')
tic
for k1 = 1:num_runs
    Anew = arrayfun(@(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
    Anew = vertcat(Anew{:});
end
toc, clear Anew

disp('-------------------  With cumsum')
tic
for k1 = 1:num_runs
    B(cumsum(r) + 1) = 1;
    idx = cumsum(B) + 1;
    idx(end) = [];
    out1 = A(idx);
end
toc,clear B idx out1

disp('-------------------  With bsxfun')
tic
for k1 = 1:num_runs
    A1 = A(ones(1,max(r)),:);
    out2 = A1(bsxfun(@le,[1:max(r)]',r));
end
toc

<强>结果

-------------------  With arrayfun
Elapsed time is 2.198521 seconds.
-------------------  With cumsum
Elapsed time is 5.360725 seconds.
-------------------  With bsxfun
Elapsed time is 2.896414 seconds.

基准测试代码 - 案例II [更大的数据量但r的最小值]

%// Parameters and input data
N = 10000;
max_repeat = 1000;

<强>结果

-------------------  With arrayfun
Elapsed time is 2.641980 seconds.
-------------------  With cumsum
Elapsed time is 3.426921 seconds.
-------------------  With bsxfun
Elapsed time is 1.858007 seconds.

基准测试的结论

对于case I，arrayfun似乎是要走的路，而对于Case II，bsxfun可能是首选武器。因此，您正在处理的数据类型似乎确实决定采用哪种方法。

重复矢量元素

3 个答案:

替代方案 - 使用`for`循环

基准测试（àDivakar）

案例＃1 - `N = 4000`，`max_repeat = 4000`

案例＃2 - `N = 10000`，`max_repeat = 1000`

基准

基准测试的结论

重复矢量元素

3 个答案:

替代方案 - 使用for循环

基准测试（àDivakar）

案例＃1 - N = 4000，max_repeat = 4000

案例＃2 - N = 10000，max_repeat = 1000

基准

基准测试的结论

替代方案 - 使用`for`循环

案例＃1 - `N = 4000`，`max_repeat = 4000`

案例＃2 - `N = 10000`，`max_repeat = 1000`