Matlab中唯一值的累积计数

时间:2014-01-09 20:31:48

标签: matlab statistics unique union

假设您正在收集卡片 - 您的相册由n_cards张卡片组成。您购买的每个包都包含cards_in_pack张卡,每张卡的提取概率相同。如果你不能交易你的双打,你需要购买多少包以收集所有卡?假设您要模拟该过程。这是一种明显的方法:

n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
cards = randi([1 n_cards], ceil(sqrt(n_cards)) * n_experiments * n_cards, 1, 'uint16');

tic
n_packs = zeros(n_experiments, 1);
ctrl1 = 1;
i_f = 0;
n = 0;
while ctrl1
  ctrl2 = 1;
  i1 = 0;
  while ctrl2
    i1 = i1 + 1;
    ctrl2 = numel(unique(cards((cards_in_pack * i_f + 1):(cards_in_pack * (i_f + i1))))) ~= n_cards;
  end
  n = n + 1;
  n_packs(n) = i1;
  i_f = i_f + i1;
  ctrl1 = n ~= n_experiments;
end
toc

% Average number of packs: 
mean(n_packs)
% Distribution of the number of packs necessary to complete the album
hist(n_packs, 50)

% Number of cards needed in the experiments: 
sum(n_packs) * cards_in_pack

这很慢 - 有更快的方法吗?具体来说:有没有一种快速的方法来计算Matlab中唯一值的累积计数?

3 个答案:

答案 0 :(得分:2)

模拟可以跨实验进行矢量化。因此,实验循环被移除,模拟时间大大减少。

由于每个实验可能会在不同时间完成(需要不同数量的包),因此实验可以分为两种状态:正在进行已完成。该代码维护了一个正在进行的实验的矢量(exps_ongoing)和每个实验中获得的卡片的0-1矩阵(cards_obtained)。

对于每个正在进行的实验,都会生成一个新包,并且该包中包含的卡被(上)写在cards_obtained上。在为正在进行的实验获取所有卡片后,该实验将从exps_ongoing中删除。所有实验结束后代码结束。

n_cards = 100;
cards_in_pack = 5;
n_experiments = 1e4;

cards_obtained = zeros(n_cards,n_experiments);
%// will contain cards obtained in each experiment
exps_ongoing = 1:n_experiments; %// list of which experiments are ongoing
n_packs = zeros(1,n_experiments); %// will record how many packs have been used
while ~isempty(exps_ongoing)
    n_packs(exps_ongoing) = n_packs(exps_ongoing) + 1;
    %// pick one pack for each ongoing experiment
    new_cards = randi(n_cards,cards_in_pack,numel(exps_ongoing));
    %// generate pack contents for each ongoing experiment
    cards_obtained(new_cards + repmat(((exps_ongoing)-1)*n_cards,cards_in_pack,1)) = true;
    %// take note of obtained cards in each ongoing experiment. Linear indexing is used here
    exps_ongoing = setdiff(exps_ongoing,exps_ongoing(all(cards_obtained(:,exps_ongoing))));
    %// ongoing experiments for which all cards have been obtained are removed
end
disp(mean(n_packs))

对于您的输入数据,这可以减少计算机上 50倍的时间(104.36秒) 相对于1.89秒,用tictoc)测量。

答案 1 :(得分:1)

好的,在这种情况下,模拟起来非常简单,因为约束对我们有利 - 我们需要知道的是,当我们没有剩下的牌时。因此,我们可以转储显式唯一性测试,只计算......

我会这样做:

n_packs = zeros(n_experiments, 1, 'uint32');
for i=1:n_experiments
    collection = zeros(n_cards, 1, 'uint32');
    while nnz(collection) < n_cards
        n_packs(i) = n_packs(i) + 1;
        pack = randi(n_cards, cards_in_pack, 1, 'uint32');
        collection(pack) = collection(pack) + 1;
    end
end

现在我不能保证会更快(我没有Matlab和我一起测试它 - 可能还有一两个bug),但它是关于我能想到的最简单的算法,简单的代码往往是快速的代码。对于最大速度调整,可以使用数据类型 - uint32对于Matlab的内部结构可能不是最佳选择。

答案 2 :(得分:0)

如果所有卡片都有相同的概率,你可以使用我为similar problem采用的这种简单方法:一次打开很多包,然后检查你需要打开多少包。

n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
nPacks=zeros(n_experiments);
for i=1:n_experiments,
   %# assume it is never going to take you >1500000 cards
   r=randi(n_cards,1500000,1);
   %# since R2013a, unique returns the first occurrence
   %# for earlier versions, take the minimum of x
   %# and subtract it from the total array length
   [~,x]=unique(r); 
   nPacks(i,1)=ceil(max(x)/cards_in_pack);
end