Question

我决定今天晚上变得有点疯狂，并用直方图箱来派对我正在分析的一些财务数据。

然而，看来派对已经陷入困境，因为我想通过研究或玩游戏来证明令人讨厌的方式，我想要应用我的“内部”操作的方式并不明显。

欲望：我想使用列中的“binning”索引来执行某种行式“bin-bin”操作，其中所述操作将对其自己的bin的第一个元素进行相对引用。请考虑以下单个bin示例，其中操作将产生差异

A =

相对操作将取第2列的所有元素与第2列的第1个元素之间的区别，以便

bin_differencing_function(A) =

1   10.4    0.0
1   10.6    0.2
1   10.3    -0.1
1   10.2    -0.2

现在，更方便的是能够为bin_differencing_function（A）提供具有任意数量的bin的双列矩阵，以便if

A =

better_bin_differencing_function(A) =

1   10.4    0.0
1   10.6    0.2
1   10.3    -0.1
1   10.2    -0.2
2   10.2    0.0
2   10.6    0.4
2   10.8    0.6
2   10.8    0.6
3   11.0    0.0
3   10.8    -0.2
3   10.8    -0.2
3   10.8    -0.2

最方便的是能够向best_bin_differencing_function（A）提供具有任意数量的bin的双列矩阵，其中bin长度可能不是恒定的，如果

A =

best_bin_differencing_function(A) =

1   10.4    0.0
1   10.6    0.2
1   10.3    -0.1
2   10.2    0.0
2   10.6    0.4
2   10.8    0.6
2   10.8    0.6
2   10.7    0.5
3   11.0    0.0
3   10.8    -0.2

最大的愿望是创建一段代码，利用矢量化（如果可能的话）在许多箱子上运行，这些箱子的长度将在1到200之间变化。我在想一个关于accumarray的游戏可能会这样做

accumarray(A(:,1),A(:,2),[],@(x) fun(x))

fun（x）是带for循环的函数。

我正在Windows 7上运行MATLAB 7.10.0.499（R2010a）。很抱歉，这个例子让这个查询花了很长时间。

Answer 1

方法＃1

这是基于bsxfun的方法 -

%// Get the first column IDs from A and positions of the elements from
%// each ID/bin
[A_id,first_idx] = unique(A(:,1))

%// First elements from each ID/bin
first_ele = A(first_idx,2)

%// Get a 2D logical array s.t. the ones in each column represent the
%// presence of all element corresponding to each ID/bin
match_ind = bsxfun(@eq,A(:,1),A_id') %//'

%// Create the base matrix with the logical array, s.t. the ones are
%// replaced by the actual elements
base_mat = bsxfun(@times,match_ind,first_ele.') %//'

%// Final accumulate all the elements and subtract from the second column
%//of A to form the new new column for the output
out = [A A(:,2) - base_mat(match_ind)]

方法＃2

基于

bsxfun的方法可能需要资源，所以这里是一个基于for循环的方法，它假定输入数据中的已排序的bin / ID -

[A_id,first_idx] = unique(A(:,1));
[A_id,last_idx] = unique(A(:,1),'last');
out = [A A(:,2)];
for k1 = 1:numel(first_idx)
    first_id = first_idx(k1);
    last_id = last_idx(k1);
    out(first_id:last_id,3) = out(first_id:last_id,3) - out(first_id,3);
end

方法＃3

这可能是一种有趣的测试方法 -

[~,first_id] = max(bsxfun(@eq,A(:,1),A(:,1)')) %//'
out = [A A(:,2) - A(first_id,2)]

方法＃4

同样，假设已排序的分箱/ ID，这是基于diff + cumsum的方法，并且在满足假设的情况下似乎是最快的 -

first_match = [1; diff(A(:,1))]
first_match(first_match==1) = [1 ; diff(find(first_match))]
out = [A A(:,2) - A(cumsum(first_match),2)]

请注意，如果未对其进行排序，您可以使用此处所示的sortrows -

[A,sorted_ind] = sortrows(A,1)
first_match = [1; diff(A(:,1))]
first_match(first_match==1) = [1 ; diff(find(first_match))]
out(sorted_ind,:) = [A A(:,2) - A(cumsum(first_match),2)]

您可以将此技术用于假设已排序数据以使其一般化的所有其他方法。

Answer 2

好吧stackoverflow，我想通了！事实证明我使用accumarray是正确的

矩阵B，C和A仅在函数内定义，以便于验证。矩阵A将按如下方式传递：best_bin_differencing_function（A）

function differenced_bins=best_bin_differencing_function()
B=[1 1 1 2 2 2 2 2 3 3]';
C=[10.4 10.6 10.3 10.2 10.6 10.8 10.8 10.7 11.0 10.8]';
A=[B,C]; 
differenced_bins=cell2mat(accumarray(A(:,1),A(:,2),[],@(x) {fun(x)}));
end

function y=fun(var)
    y=zeros(length(var),1);
    for i=1:length(var)
        y(i)=var(i)-var(1);
    end
end

我将在此和@Divakar的回复之间进行压力测试，并相应地进行投票。谢谢大家一起来看看！

花式分箱操作 - 如何对相对的箱内操作进行矢量化？

2 个答案: