R:产生混合物分布的功能

时间:2014-05-05 19:46:02

标签: r probability markov-chains mcmc mixture-model

我需要从混合分发中生成样本

  • 40%的样本来自高斯(平均值= 2,sd = 8)

  • 20%的样本来自Cauchy(位置= 25,比例= 2)

  • 40%样本来自高斯(平均值= 10,sd = 6)

为此,我编写了以下函数:

dmix <- function(x){
prob <- (0.4 * dnorm(x,mean=2,sd=8)) + (0.2 * dcauchy(x,location=25,scale=2)) + (0.4 * dnorm(x,mean=10,sd=6))
return (prob)
}

然后测试:

foo = seq(-5,5,by = 0.01)
vector = NULL
for (i in 1:1000){
vector[i] <- dmix(foo[i])
}
hist(vector)

我得到这样的直方图(我知道这是错的) - histogram of the supposed distribution

我做错了什么?有人可以指点一下吗?

2 个答案:

答案 0 :(得分:7)

当然还有其他方法可以做到这一点,但 distr 包让它非常简单。 (See also this answer获取另一个示例以及有关 distr 和朋友的更多详情。

library(distr)

## Construct the distribution object.
myMix <- UnivarMixingDistribution(Norm(mean=2, sd=8), 
                                  Cauchy(location=25, scale=2),
                                  Norm(mean=10, sd=6),
                                  mixCoeff=c(0.4, 0.2, 0.4))
## ... and then a function for sampling random variates from it
rmyMix <- r(myMix)

## Sample a million random variates, and plot (part of) their histogram
x <- rmyMix(1e6)
hist(x[x>-100 & x<100], breaks=100, col="grey", main="")

enter image description here

如果您只想直接查看混合物分发的pdf,请执行以下操作:

plot(myMix, to.draw.arg="d") 

enter image description here

答案 1 :(得分:1)

如果可以,请始终使用R矢量化。 即使实际上丢弃了许多值,它通常也会更有效。 (至少比以前的解决方案更快,并避免额外的库)

rmy_ve = function(n){

##generation of (n x 3) matrix. 
##Each column is a random sample of size n from a single component of the mixture
temp = cbind(rnorm(n,2,8),rcauchy(n,25,2),rnorm(n,10,6))

##random generation of the indices
id = sample(1:3,n,rep = T,prob = c(.4,.2,.4))  
id = cbind(1:n,id)
temp[id]
}


> microbenchmark(rmy_ve(1e6),rmyMix(1e6))
Unit: milliseconds
       expr     min       lq     mean   median       uq      max    neval
rmy_ve(1e+06) 241.904 248.7528 272.9119 260.8752 298.1126 322.7429   100
rmyMix(1e+06) 270.917 322.3627 341.4779 329.1706 364.3947 561.2608   100