Question

在推特中，我遇到了一个类似的谜题：

有十个硬币，一边是空白，另一边是数字1到10，所有十个硬币都被抛出，并计算着陆面朝上的数字的总和。这笔金额至少为45的概率是多少？

我想建立一个模拟，重现分析解决方案，当然是43/1024，当然有一些相当小的错误

我的第一次尝试：

# Create a list of all possible values
values <- c(1:10, rep(0,10))

# Number of trials
nt <- 5e5

# Container vector to store the results
output <- vector(mode = "numeric", length = nt)

# set seed for reproducible result
ns <- 42

# Loop
for (i in 1:nt) {
  set.seed(ns)
  temp <- sample(x = values, size = 10, replace = F)
  output[i] <- sum(temp)
  ns <- ns + 1
}

length(output[output > 44]) / nt


# [1] 0.013736

第二次尝试：

rm(list = ls())

values <- 1:10
nt <- 5e5
output <- vector(mode = "numeric", length = nt)
ns <- 42

for (i in 1:nt) {
  set.seed(ns)
  temp <- sample(x = c(0,1), size = 10, replace = T)
  output[i] <- sum(temp * values)
  ns <- ns + 1
}

length(output[output > 44]) / nt

# [1] 0.042038

# Find the fraction (X / 1024) which trows minimal error

sim.res <- length(output[output > 44]) / nt
fractions <- (1:2^10)/2^10
difference <- abs(sim.res - fractions)
which(difference == min(difference))

# [1] 43

显然，这两种方法的唯一区别是为模拟构建了样本空间。我盯着代码，无法弄清楚为什么数字1是错误的。他们应该为我所知道的所有人做出正确的表现。

Answer 1

您的第一个（不正确的）模拟只是选择10个数字，而不是从矢量中替换：

c(1:10, rep(0, 10))
#>  [1]  1  2  3  4  5  6  7  8  9 10  0  0  0  0  0  0  0  0  0  0

（删除了获得10 0概率的错误计算。）

假设您的10个选项中的第一个不是0。也许它出现在7。这意味着第二次选择0的概率现在是10/19。

但是在所描述的实际场景中，如果0-7标记的硬币出现为7并不重要，任何其他硬币出现为0的概率仍为1/2，因为硬币结果是独立的。

顺便说一下，replicate函数几乎是为模拟而构建的。以下是编写此模拟的R方法：

nt <- 1e5
sims <- replicate(nt, sample(0:1, 10, repl=TRUE)*(1:10))

有问题的概率是：

p_gte_45 <- mean(apply(sims, 2, function(x) sum(x) >= 45))

模拟中的样本空间错误

1 个答案: