如何在10,000次模拟中找到最长运行的平均值？

Question

我试图在翻转硬币30次时找到10,000次模拟中最长跑的平均值。我需要在R中进行模拟，上面描述的实验10,000次，每次都记录最长运行的长度。

到目前为止，这是我的代码：

coin <- sample(c("H", "T"), 10000, replace = TRUE)
table(coin) 
head(coin, n = 30)
rle(c("H", "T", "T", "H", "H", "H", "H", "H", "T", "H"))
coin.rle <- rle(coin)
str(coin.rle)

如何在10,000次模拟中找到最长运行的平均值？

Answer 1

我认为以下是你所追求的目标。

n_runs <- 10000
max_runs <- numeric(n_runs)
for (j in 1:n_runs) {
 coin <- sample(c("H", "T"), 30, replace = TRUE) 
 max_runs[j] <- max(rle(coin)$length)
}
mean(max_runs)

有关代码的说明，最好检查coin的一小部分（例如coin[20]）及其rle（rle(coin[20])）。计算每个运行段的长度，因此max(rle(coin)$length)给出最大运行。

编辑：以下可能更快

len <- 30
times <- 10000

flips <- sample(c("H", "T"), len * times, replace = TRUE) 
runs <- sapply(split(flips, ceiling(seq_along(flips)/len)),
                    function(x) max(rle(x)$length))
mean(runs) # average of max runs
sum(runs >= 7)/ times # number of runs >= 7

Answer 2

所有硬币翻转彼此独立（即，一次翻转的结果不影响另一次翻转）。因此，我们可以立即翻转所有模拟的所有硬币，然后以这样的方式进行格式化，这样可以更简单地总结每个30次翻转试验。以下是我将如何做到这一点。

# do all of the flips at once, this is okay because each flip
# is independent
coin_flips <- sample(c("heads", "tails"), 30 * 10000, replace = TRUE)

# put them into a 10000 by 30 matrix, each row
# indicates one 'simulation'
coin_matrix <- matrix(coin_flips, ncol = 30, nrow = 10000)

# we now want to iterate through each row using apply,
# to do so we need to make a function to apply to each
# row. This gets us the longest run over a single
# simulation
get_long_run <- function(x) {
  max(rle(x)$length)
}

# apply this function to each row
longest_runs <- apply(coin_matrix, 1, get_long_run)

# get the number of simulations that had a max run >= 7. Divide this
# by the number of simulations to get the probability of this occuring.
sum(longest_runs >= 7)/nrow(coin_matrix)

你应该得到18-19％之间的东西，但每次尝试这种模拟时这会有所不同。

贝叶斯统计：模拟R，上述实验10,000次，每次记录最长运行的长度

如何在10,000次模拟中找到最长运行的平均值？

2 个答案: