通过询问二进制问题来模拟找到随机选择的数字

时间:2017-03-29 13:35:03

标签: matlab random octave probability entropy

作为作业中的问题,我被要求编写一个Octave函数,模拟1000个实验,找到一个随机变量 X ,字母 {0,1,2,3} 和pmf:

Px(0)= 1/8

Px(1)= 1/4

Px(2)= 1/2

Px(3)= 1/8

通过询问一系列二元,“是”或“否”的问题。

我已经确定要求找到 X 的值的二进制问题的最佳序列是简单地询问“Is X = p?”其中p是可能的值,按概率递减的顺序排列。

所以最佳顺序是:

  1. 是X = 2?

    如果不是:

  2. 是X = 1?

    如果不是:

  3. 是X = 0?

    如果没有,那么 X = 3

  4. 这是我写的功能:

    function x = guessing_experiment(probabilities, n)
      % generates n simulations of finding a random number in an alphabet by asking binary questions,
      % where 'probabilities' is a list of the probabilities per number in the order the questions will be asked
    
      num_Qs = zeros(1,n);                            % allocate array of size n for number of questions asked per experiment
      [num_col, alphabet_size] = size(probabilities); % get size of alphabet
    
      for i = 1:n                                     % generate n experiments
        Qs = 0;                                       % number of questions asked in this experiment
        for j = 1:alphabet_size - 1                   % iterate through questions
          question = rand;                            % generate random number in range [0, 1]
          Qs++;                                       % incremenet number of questions asked
          if (question <= probabilities(j))           % if question produces a "yes" answer
            break;
          endif
        endfor
        num_Qs(i) = Qs;                               % store number of questions asked for this experiment
      endfor
    
      x = mean(num_Qs);                               % calculate mean number of questions asked over the n experiments 
    
     end
    

    其中称为guessing_experiment([1/2, 1/4, 1/8, 1/8], 1000) 数组是每个问题产生“是”答案的概率,按照它们的询问顺序排列, n 是实验的数量。

    提出这些问题应该会产生1.75的平均问题,但我的程序总是产生~1.87的平均值。我的脚本错误在哪里?

    我假设它与生成一个新的随机数有关,以模拟所提出的3个问题中的每一个。

1 个答案:

答案 0 :(得分:0)

我删除了之前的错误答案,其中说明您的脚本是正确的,并且您的计算错误。我再次考虑它,你的计算是正确的。我自己尝试使用以下MATLAB脚本:

% probabilities for each number
p = [1/8,1/4,1/2,1/8];
% sort them from higher to lower
p = sort(p,'descend');
% number of questions per probability
nq = 1:length(p)-1;
% the last question can distinguish between two variables
nq(end+1) = nq(end);
% number of trials
n = 100000;
% random sample number of questions
q = randsample(nq,n,true,p);
% mean number of questions
avgQ = mean(q)

和获得的平均值。是1.75 - 正如你计算的那样。 我将尝试再次查看您的代码以查看错误

修改

您的脚本存在的问题是您忽略了conditional probability,即在询问有关变量的问题时忽略了您已经了解的信息。例如,当您提出第三个问题时,该值0的概率不是 p=1/8而是p=1/2,因为您已经知道它不是12。 您需要做的修复是将概率除以可能的事件概率probabilities(j)/sum(probabilities(j:end))

n = 10000;
p = [1/8,1/4,1/2,1/8];
% sort them from higher to lower
probabilities = sort(p,'descend');
probabilities(end-1) = probabilities(end-1) + probabilities(end);
probabilities(end) = [];
alphabet_size = numel(probabilities);
num_Qs = zeros(1,n);                            % allocate array of size n for number of questions asked per experiment

for i = 1:n                                     % generate n experiments
    Qs = 0;                                       % number of questions asked in this experiment
    for j = 1:alphabet_size                   % iterate through questions
        question = rand;                            % generate random number in range [0, 1]
        Qs = Qs + 1;                                       % incremenet number of questions asked
        if question < probabilities(j)/sum(probabilities(j:end))           % if question produces a "yes" answer
            break;
        end
    end
    num_Qs(i) = Qs;                               % store number of questions asked for this experiment
end

x = mean(num_Qs)

x~1.75

此场景中条件概率的向量是:

p = [1/8,1/4,1/2,1/8];
p = sort(p,'descend');
cond_p = p./cumsum(p,'reverse')

cond_p =

    0.5000    0.5000    0.5000    1.0000