在许多试验过程中计算实验中有利结果的相对频率

时间:2018-04-11 23:41:31

标签: r dataframe

以下是该问题的一些快速背景:

There are two boxes with balls of different colors. Box 1 contains two blue 
balls, and one red ball. Box 2 contains two blue balls, three red balls, and 
one white ball.

The random experiment consists of generating a random number that follows a 
uniform distribution (min = 0, max = 1). If the number is greater than 0.5, 
then a sample with replacement of size 4 is drawn from box 1. If the random 
number is less than or equal to 0.5, then a sample without replacement of 
size 4 is drawn from box 2. The goal is to find the probabilities of getting 
either 0, 1, 2, 3, or 4 blue balls.

我已经在1000个试验中创建了一个模拟该实验的矩阵,如下所示:

      [,1]   [,2]    [,3]   [,4]  
[1,] "red"  "blue"  "blue" "blue"
[2,] "blue" "blue"  "red"  "blue"
[3,] "blue" "blue"  "blue" "blue"
[4,] "red"  "white" "red"  "blue"
[5,] "blue" "blue"  "red"  "red" 
[6,] "red"  "blue"  "blue" "red"  

我有一个1000个元素的向量,其中i th 元素是" blue"的出现次数的计数。在矩阵的i th 行中。

现在我的目标是以某种方式将蓝球数量的相对频率放入数据框中,这样我就可以创建一个如下所示的图形:

enter image description here

但是,我不知道如何在R中做到这一点。我已经尝试了一些for循环但是没有获得任何地方。谢谢你的帮助。

1 个答案:

答案 0 :(得分:0)

这不是一个完整或经过验证的解决方案,因为我在移动设备上,但步骤应该对您有用:

将矩阵列连接在一起并计算“蓝色”出现的次数。将此计数放在具有重复的数据框和我们将在轴中使用的值列中:

library(stringr)

result_df <- data.frame(rep=1:nrow(your_data), 
blue_count=str_count(paste0(your_data[,1],your_data[,2],your_data[,3],your_data[,4]), "blue"), 
value=1)

透视列“blue_count”,以便每个结果都有列,而1表示该代表的结果。

library(reshape)

pivot_df <- cast(result_df, rep ~ blue_count)
pivot_df[is.na(pivot_df)] <- 0

累计对列进行求和并除以rep以产生每个结果的频率。

freq_df <- data.frame(rep=pivot_df[,1], 
outcome_0=cumsum(pivot_df[,2])/pivot_df[,1],
outcome_1=cumsum(pivot_df[,3])/pivot_df[,1],
outcome_2=cumsum(pivot_df[,4])/pivot_df[,1],
outcome_3=cumsum(pivot_df[,5])/pivot_df[,1],
outcome_4=cumsum(pivot_df[,6])/pivot_df[,1])

使用ggplot2进行绘图,您可能需要对data.frame进行unpivot。