Question

假设您正在收集卡片 - 您的相册由n_cards张卡片组成。您购买的每个包都包含cards_in_pack张卡，每张卡的提取概率相同。如果你不能交易你的双打，你需要购买多少包以收集所有卡？假设您要模拟该过程。这是一种明显的方法：

n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
cards = randi([1 n_cards], ceil(sqrt(n_cards)) * n_experiments * n_cards, 1, 'uint16');

tic
n_packs = zeros(n_experiments, 1);
ctrl1 = 1;
i_f = 0;
n = 0;
while ctrl1
  ctrl2 = 1;
  i1 = 0;
  while ctrl2
    i1 = i1 + 1;
    ctrl2 = numel(unique(cards((cards_in_pack * i_f + 1):(cards_in_pack * (i_f + i1))))) ~= n_cards;
  end
  n = n + 1;
  n_packs(n) = i1;
  i_f = i_f + i1;
  ctrl1 = n ~= n_experiments;
end
toc

% Average number of packs: 
mean(n_packs)
% Distribution of the number of packs necessary to complete the album
hist(n_packs, 50)

% Number of cards needed in the experiments: 
sum(n_packs) * cards_in_pack

这很慢 - 有更快的方法吗？具体来说：有没有一种快速的方法来计算Matlab中唯一值的累积计数？

Answer 1

模拟可以跨实验进行矢量化。因此，实验循环被移除，模拟时间大大减少。

由于每个实验可能会在不同时间完成（需要不同数量的包），因此实验可以分为两种状态：正在进行或已完成。该代码维护了一个正在进行的实验的矢量（exps_ongoing）和每个实验中获得的卡片的0-1矩阵（cards_obtained）。

对于每个正在进行的实验，都会生成一个新包，并且该包中包含的卡被（上）写在cards_obtained上。在为正在进行的实验获取所有卡片后，该实验将从exps_ongoing中删除。所有实验结束后代码结束。

n_cards = 100;
cards_in_pack = 5;
n_experiments = 1e4;

cards_obtained = zeros(n_cards,n_experiments);
%// will contain cards obtained in each experiment
exps_ongoing = 1:n_experiments; %// list of which experiments are ongoing
n_packs = zeros(1,n_experiments); %// will record how many packs have been used
while ~isempty(exps_ongoing)
    n_packs(exps_ongoing) = n_packs(exps_ongoing) + 1;
    %// pick one pack for each ongoing experiment
    new_cards = randi(n_cards,cards_in_pack,numel(exps_ongoing));
    %// generate pack contents for each ongoing experiment
    cards_obtained(new_cards + repmat(((exps_ongoing)-1)*n_cards,cards_in_pack,1)) = true;
    %// take note of obtained cards in each ongoing experiment. Linear indexing is used here
    exps_ongoing = setdiff(exps_ongoing,exps_ongoing(all(cards_obtained(:,exps_ongoing))));
    %// ongoing experiments for which all cards have been obtained are removed
end
disp(mean(n_packs))

对于您的输入数据，这可以减少计算机上 50倍的时间（104.36秒）相对于1.89秒，用tic，toc）测量。

Answer 2

好的，在这种情况下，模拟起来非常简单，因为约束对我们有利 - 我们需要知道的是，当我们没有剩下的牌时。因此，我们可以转储显式唯一性测试，只计算......

我会这样做：

n_packs = zeros(n_experiments, 1, 'uint32');
for i=1:n_experiments
    collection = zeros(n_cards, 1, 'uint32');
    while nnz(collection) < n_cards
        n_packs(i) = n_packs(i) + 1;
        pack = randi(n_cards, cards_in_pack, 1, 'uint32');
        collection(pack) = collection(pack) + 1;
    end
end

现在我不能保证会更快（我没有Matlab和我一起测试它 - 可能还有一两个bug），但它是关于我能想到的最简单的算法，简单的代码往往是快速的代码。对于最大速度调整，可以使用数据类型 - uint32对于Matlab的内部结构可能不是最佳选择。

Answer 3

如果所有卡片都有相同的概率，你可以使用我为similar problem采用的这种简单方法：一次打开很多包，然后检查你需要打开多少包。

n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
nPacks=zeros(n_experiments);
for i=1:n_experiments,
   %# assume it is never going to take you >1500000 cards
   r=randi(n_cards,1500000,1);
   %# since R2013a, unique returns the first occurrence
   %# for earlier versions, take the minimum of x
   %# and subtract it from the total array length
   [~,x]=unique(r); 
   nPacks(i,1)=ceil(max(x)/cards_in_pack);
end

Matlab中唯一值的累积计数

3 个答案: