匹配列表中的元素

时间:2012-02-15 06:03:40

标签: r

刚刚开始在R中编程......对这一个感到困惑,也许是因为我不知道从哪里开始。

在匹配之前将随机变量定义为等于试验次数。所以,如果你有一个数字列表,(4,5,7,11,3,11,12,8,8,1 ....),随机变量的第一个值是6,因为到那时有两个11's。(4,5,7,11,3,11)第二个值是3,因为那时你有2个8..12,8,8。 下面的代码通过从统一分布模拟来创建数字列表u。

感谢您的任何帮助或指示。如果有人有兴趣(试图通过编写统计文本来学习),我已经包含了我在下面解决的问题的完整描述。

set.seed(1); u = matrix(runif(1000), nrow=1000)
u[u > 0    & u <= 1/12]   <- 1
u[u > 1/12 & u <= 2/12]   <- 2
u[u > 2/12 & u <= 3/12]   <- 3
u[u > 3/12 & u <= 4/12]   <- 4
u[u > 4/12 & u <= 5/12]   <- 5
u[u > 5/12 & u <= 6/12]   <- 6
u[u > 6/12 & u <= 7/12]   <- 7
u[u > 7/12 & u <= 8/12]   <- 8
u[u > 8/12 & u <= 9/12]   <- 9
u[u > 9/12 & u <= 10/12]  <- 10
u[u > 10/12 & u <= 11/12] <- 11
u[u > 11/12 & u < 12/12] <- 12
table(u); u[1:10,]

示例2.6-3概率和随机建模中的概念,希金斯 假设我们在他们出生的那个月随意询问人们。让随机变量X表示在我们找到同一个月出生的两个人之前我们需要询问的人数。 X的可能值是2,3,... 13。也就是说,必须要求至少两个人进行比赛,并且不得超过13人。通过简化的假设,每个月都是响应的同等候选者,使用计算机模拟来估计X的概率质量函数。模拟生成出生月份直到找到匹配。基于该实验的1000次重复,获得了以下经验分布和样本统计...

1 个答案:

答案 0 :(得分:4)

R有一个陡峭的初始学习曲线。我不认为这是你的功课是公平的,是的,如果你知道你在寻找什么,就有可能找到解决方案。但是,我记得有时很难在网上研究问题只是因为我不知道要搜索什么(我对术语不够熟悉)。

下面是解决R中问题的一种方法的解释。阅读评论的代码并尝试弄清楚它正在做什么。不过,我建议通过一个很好的初学者资源。从记忆中,一个好的起床和跑步是icebreakeR,但有许多人在那里......

# set the number of simulations
nsim <- 10000

# Create a matrix, with nsim columns, and fill it with something. 
#  The something with which you'll populate it is a random sample, 
#  with replacement, of month names (held in a built-in vector called
#  'month.abb'). We're telling the sample function that it should take 
#  13*nsim samples, and these will be used to fill the matrix, which 
#  has nsim columns (and hence 13 rows). We've chosen to take samples 
#  of length 13, because as your textbook states, 13 is the maximum
#  number of month names necessary for a month name to be duplicated.
mat <- matrix(sample(month.abb, 13*nsim, replace=TRUE), ncol=nsim)

# If you like, take a look at the first 10 columns
mat[, 1:10]

# We want to find the position of the first duplicated value for each column. 
#  Here's one way to do this, but it might be a bit confusing if you're just 
#  starting out. The 'apply' family of functions is very useful for
#  repeatedly applying a function to columns/rows/elements of an object.
#  Here, 'apply(mat, 2, foo)' means that for each column (2 represents columns,
#  1 would apply to rows, and 1:2 would apply to every cell), do 'foo' to that
#  column. Our function below extends this a little with a custom function. It
#  says: for each column of mat in turn, call that column 'x' and perform 
#  'match(1, duplicated(x))'. This match function will return the position
#  of the first '1' in the vector 'duplicated(x)'. The vector 'duplicated(x)'
#  is a logical (boolean) vector that indicates, for each element of x,
#  whether that element has already occurred earlier in the vector (i.e. if 
#  the month name has already occurred earlier in x, the corresponding element
#  of duplicated(x) will be TRUE (which equals 1), else it will be false (0).
#  So the match function returns the position of the first duplicated month 
#  name (well, actually the second instance of that month name). e.g. if 
#  x consists of 'Jan', 'Feb', 'Jan', 'Mar', then duplicated(x) will be 
#  FALSE, FALSE, TRUE, FALSE, and match(1, duplicated(x)) will return 3. 
#  Referring back to your textbook problem, this is x, a realisation of the 
#  random variable X.
# Because we've used the apply function, the object 'res' will end up with
#  nsim realisations of X, and these can be plotted as a histogram.
res <- apply(mat, 2, function(x) match(1, duplicated(x)))
hist(res, breaks=seq(0.5, 13.5, 1))

Histogram of results