以下是该问题的一些快速背景:
There are two boxes with balls of different colors. Box 1 contains two blue
balls, and one red ball. Box 2 contains two blue balls, three red balls, and
one white ball.
The random experiment consists of generating a random number that follows a
uniform distribution (min = 0, max = 1). If the number is greater than 0.5,
then a sample with replacement of size 4 is drawn from box 1. If the random
number is less than or equal to 0.5, then a sample without replacement of
size 4 is drawn from box 2. The goal is to find the probabilities of getting
either 0, 1, 2, 3, or 4 blue balls.
我已经在1000个试验中创建了一个模拟该实验的矩阵,如下所示:
[,1] [,2] [,3] [,4]
[1,] "red" "blue" "blue" "blue"
[2,] "blue" "blue" "red" "blue"
[3,] "blue" "blue" "blue" "blue"
[4,] "red" "white" "red" "blue"
[5,] "blue" "blue" "red" "red"
[6,] "red" "blue" "blue" "red"
我有一个1000个元素的向量,其中i th 元素是" blue"的出现次数的计数。在矩阵的i th 行中。
现在我的目标是以某种方式将蓝球数量的相对频率放入数据框中,这样我就可以创建一个如下所示的图形:
但是,我不知道如何在R中做到这一点。我已经尝试了一些for
循环但是没有获得任何地方。谢谢你的帮助。
答案 0 :(得分:0)
这不是一个完整或经过验证的解决方案,因为我在移动设备上,但步骤应该对您有用:
将矩阵列连接在一起并计算“蓝色”出现的次数。将此计数放在具有重复的数据框和我们将在轴中使用的值列中:
library(stringr)
result_df <- data.frame(rep=1:nrow(your_data),
blue_count=str_count(paste0(your_data[,1],your_data[,2],your_data[,3],your_data[,4]), "blue"),
value=1)
透视列“blue_count”,以便每个结果都有列,而1表示该代表的结果。
library(reshape)
pivot_df <- cast(result_df, rep ~ blue_count)
pivot_df[is.na(pivot_df)] <- 0
累计对列进行求和并除以rep以产生每个结果的频率。
freq_df <- data.frame(rep=pivot_df[,1],
outcome_0=cumsum(pivot_df[,2])/pivot_df[,1],
outcome_1=cumsum(pivot_df[,3])/pivot_df[,1],
outcome_2=cumsum(pivot_df[,4])/pivot_df[,1],
outcome_3=cumsum(pivot_df[,5])/pivot_df[,1],
outcome_4=cumsum(pivot_df[,6])/pivot_df[,1])
使用ggplot2进行绘图,您可能需要对data.frame进行unpivot。