假设您正在收集卡片 - 您的相册由n_cards
张卡片组成。您购买的每个包都包含cards_in_pack
张卡,每张卡的提取概率相同。如果你不能交易你的双打,你需要购买多少包以收集所有卡?假设您要模拟该过程。这是一种明显的方法:
n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
cards = randi([1 n_cards], ceil(sqrt(n_cards)) * n_experiments * n_cards, 1, 'uint16');
tic
n_packs = zeros(n_experiments, 1);
ctrl1 = 1;
i_f = 0;
n = 0;
while ctrl1
ctrl2 = 1;
i1 = 0;
while ctrl2
i1 = i1 + 1;
ctrl2 = numel(unique(cards((cards_in_pack * i_f + 1):(cards_in_pack * (i_f + i1))))) ~= n_cards;
end
n = n + 1;
n_packs(n) = i1;
i_f = i_f + i1;
ctrl1 = n ~= n_experiments;
end
toc
% Average number of packs:
mean(n_packs)
% Distribution of the number of packs necessary to complete the album
hist(n_packs, 50)
% Number of cards needed in the experiments:
sum(n_packs) * cards_in_pack
这很慢 - 有更快的方法吗?具体来说:有没有一种快速的方法来计算Matlab中唯一值的累积计数?
答案 0 :(得分:2)
模拟可以跨实验进行矢量化。因此,实验循环被移除,模拟时间大大减少。
由于每个实验可能会在不同时间完成(需要不同数量的包),因此实验可以分为两种状态:正在进行或已完成。该代码维护了一个正在进行的实验的矢量(exps_ongoing
)和每个实验中获得的卡片的0-1矩阵(cards_obtained
)。
对于每个正在进行的实验,都会生成一个新包,并且该包中包含的卡被(上)写在cards_obtained
上。在为正在进行的实验获取所有卡片后,该实验将从exps_ongoing
中删除。所有实验结束后代码结束。
n_cards = 100;
cards_in_pack = 5;
n_experiments = 1e4;
cards_obtained = zeros(n_cards,n_experiments);
%// will contain cards obtained in each experiment
exps_ongoing = 1:n_experiments; %// list of which experiments are ongoing
n_packs = zeros(1,n_experiments); %// will record how many packs have been used
while ~isempty(exps_ongoing)
n_packs(exps_ongoing) = n_packs(exps_ongoing) + 1;
%// pick one pack for each ongoing experiment
new_cards = randi(n_cards,cards_in_pack,numel(exps_ongoing));
%// generate pack contents for each ongoing experiment
cards_obtained(new_cards + repmat(((exps_ongoing)-1)*n_cards,cards_in_pack,1)) = true;
%// take note of obtained cards in each ongoing experiment. Linear indexing is used here
exps_ongoing = setdiff(exps_ongoing,exps_ongoing(all(cards_obtained(:,exps_ongoing))));
%// ongoing experiments for which all cards have been obtained are removed
end
disp(mean(n_packs))
对于您的输入数据,这可以减少计算机上 50倍的时间(104.36秒)
相对于1.89秒,用tic
,toc
)测量。
答案 1 :(得分:1)
好的,在这种情况下,模拟起来非常简单,因为约束对我们有利 - 我们需要知道的是,当我们没有剩下的牌时。因此,我们可以转储显式唯一性测试,只计算......
我会这样做:
n_packs = zeros(n_experiments, 1, 'uint32');
for i=1:n_experiments
collection = zeros(n_cards, 1, 'uint32');
while nnz(collection) < n_cards
n_packs(i) = n_packs(i) + 1;
pack = randi(n_cards, cards_in_pack, 1, 'uint32');
collection(pack) = collection(pack) + 1;
end
end
现在我不能保证会更快(我没有Matlab和我一起测试它 - 可能还有一两个bug),但它是关于我能想到的最简单的算法,简单的代码往往是快速的代码。对于最大速度调整,可以使用数据类型 - uint32
对于Matlab的内部结构可能不是最佳选择。
答案 2 :(得分:0)
如果所有卡片都有相同的概率,你可以使用我为similar problem采用的这种简单方法:一次打开很多包,然后检查你需要打开多少包。
n_cards = 100; n_experiments = 1e4; cards_in_pack = 5;
nPacks=zeros(n_experiments);
for i=1:n_experiments,
%# assume it is never going to take you >1500000 cards
r=randi(n_cards,1500000,1);
%# since R2013a, unique returns the first occurrence
%# for earlier versions, take the minimum of x
%# and subtract it from the total array length
[~,x]=unique(r);
nPacks(i,1)=ceil(max(x)/cards_in_pack);
end