我想计算R中两个概率分布的卷积,我需要一些帮助。为了简单起见,假设我有一个变量x,它通常分布为mean = 1.0和stdev = 0.5,而y是log-normal distribution,其中mean = 1.5,stdev = 0.75。我想确定z = x + y。我知道z的分布不是先验已知的。
另外,我正在使用的真实世界示例需要添加两个根据许多不同分布分布的随机变量。
有没有人知道如何通过卷积x和y的概率密度函数来添加两个随机变量?
我尝试生成n个正态分布的随机值(带有上述参数)并将它们添加到n个对数正态分布的随机值中。但是,我想知道我是否可以使用卷积方法。任何帮助将不胜感激。
修改
感谢您的回答。我定义了一个pdf,并尝试进行卷积积分,但R抱怨整合步骤。我的pdf是Log Pearson 3,如下
dlp3 <- function(x, a, b, g) {
p1 <- 1/(x*abs(b) * gamma(a))
p2 <- ((log(x)-g)/b)^(a-1)
p3 <- exp(-1* (log(x)-g) / b)
d <- p1 * p2 * p3
return(d)
}
f.m <- function(x) dlp3(x,3.2594,-0.18218,0.53441)
f.s <- function(x) dlp3(x,9.5645,-0.07676,1.184)
f.t <- function(z) integrate(function(x,z) f.s(z-x)*f.m(x),-Inf,Inf,z)$value
f.t <- Vectorize(f.t)
integrate(f.t, lower = 0, upper = 3.6)
由于f.t功能有限并且我的积分限制可能不正确,因此R在最后一步抱怨。关于如何解决这个问题的任何想法?
答案 0 :(得分:13)
这是一种方式。
f.X <- function(x) dnorm(x,1,0.5) # normal (mu=1.5, sigma=0.5)
f.Y <- function(y) dlnorm(y,1.5, 0.75) # log-normal (mu=1.5, sigma=0.75)
# convolution integral
f.Z <- function(z) integrate(function(x,z) f.Y(z-x)*f.X(x),-Inf,Inf,z)$value
f.Z <- Vectorize(f.Z) # need to vectorize the resulting fn.
set.seed(1) # for reproducible example
X <- rnorm(1000,1,0.5)
Y <- rlnorm(1000,1.5,0.75)
Z <- X + Y
# compare the methods
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
使用包distr
同样的事情。
library(distr)
N <- Norm(mean=1, sd=0.5) # N is signature for normal dist
L <- Lnorm(meanlog=1.5,sdlog=0.75) # same for log-normal
conv <- convpow(L+N,1) # object of class AbscontDistribution
f.Z <- d(conv) # distribution function
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
答案 1 :(得分:0)
我在让integrate()
适用于不同的密度参数时遇到了麻烦,因此我想出了一种使用Riemann近似的@jlhoward方法的替代方法:
set.seed(1)
#densities to be convolved. could also put these in the function below
d1 <- function(x) dnorm(x,1,0.5) #
d2 <- function(y) dlnorm(y,1.5, 0.75)
#Riemann approximation of convolution
conv <- function(t, a, b, d) { #a to b needs to cover the range of densities above. d needs to be small for accurate approx.
z <- NA
x <- seq(a, b, d)
for (i in 1:length(t)){
print(i)
z[i] <- sum(d1(x)*d2(t[i]-x)*d)
}
return(z)
}
#check against sampled convolution
X <- rnorm(1000, 1, 0.5)
Y <- rlnorm(1000, 1.5, 0.75)
Z <- X + Y
t <- seq(0, 50, 0.05) #range to evaluate t, smaller increment -> smoother curve
hist(Z, breaks = 50, freq = F, xlim = c(0,30))
lines(t, conv(t, -100, 100, 0.1), type = "s", col = "red")