Question

用代码我计算二元正态分布的密度。这里我使用两个公式，它们应该返回相同的结果。

第一个公式使用mvtnorm包的dmvnorm，第二个公式使用维基百科中的公式（https://en.wikipedia.org/wiki/Multivariate_normal_distribution）。

当两个分布的标准偏差等于1时（协方差矩阵仅在主对角线上有一个），结果是相同的。但是，当您将协方差矩阵中的两个条目更改为两个或三分之一时...结果不是完全相同的。

（我希望）我已经正确阅读了帮助，还有这份文件（https://cran.r-project.org/web/packages/mvtnorm/vignettes/MVT_Rnews.pdf）。

这里有stackoverflow（How to calculate multivariate normal distribution function in R）我发现了这个，因为我的协方差矩阵可能是错误定义的。

但直到现在我找不到答案......

所以我的问题：当标准偏差不等于1时，为什么我的代码会返回不同的结果？

我希望我提供了足够的信息......但是当缺少某些内容时请发表评论。我会编辑我的问题。

非常感谢提前！

现在我的代码：

   library(mvtnorm)  # for loading the package if necessary

    mu=c(0,0)
    rho=0
    sigma=c(1,1)  # the standard deviation which should be changed to two or one third or… to see the different results
    S=matrix(c(sigma[1],0,0,sigma[2]),ncol=2,byrow=TRUE)

    x=rmvnorm(n=100,mean=mu,sigma=S)
    dim(x)  # for control
    x[1:5,]  # for visualization

    # defining a function
    Comparison=function(Points=x,mean=mu,sigma=S,quantity=4) {
    for (i in 1:quantity) {
           print(paste0("The ",i," random point"))
           print(Points[i,])
           print("The following two results should be the same")
           print("Result from the function 'dmvnorm' out of package 'mvtnorm'")
           print(dmvnorm(Points[i,],mean=mu,sigma=sigma,log=FALSE))
           print("Result from equation out of wikipedia")
           print(1/(2*pi*S[1,1]*S[2,2]*(1-rho^2)^(1/2))*exp((-1)/(2*(1-rho^2))*(Points[i,1]^2/S[1,1]^2+Points[i,2]^2/S[2,2]^2-(2*rho*Points[i,1]*Points[i,2])/(S[1,1]*S[2,2]))))
           print("----")
           print("----")
    } # end for-loop     
    } # end function

    # execute the function and compare the results
    Comparison(Points=x,mean=mu,sigma=S,quantity=4)

Answer 1

请记住，S是方差 - 协方差矩阵。您在维基百科中使用的公式使用标准差而不是方差。因此，您需要将对角线条目的平方根插入公式中。当你选择1作为对角线条目（方差和SD都是1）时，这也是它起作用的原因。

请参阅以下修改后的代码：

 library(mvtnorm)  # for loading the package if necessary

 mu=c(0,0)
 rho=0
 sigma=c(2,1)  # the standard deviation which should be changed to two or one      third or… to see the different results
 S=matrix(c(sigma[1],0,0,sigma[2]),ncol=2,byrow=TRUE)

 x=rmvnorm(n=100,mean=mu,sigma=S)
 dim(x)  # for control
 x[1:5,]  # for visualization

 # defining a function
 Comparison=function(Points=x,mean=mu,sigma=S,quantity=4) {
   for (i in 1:quantity) {
     print(paste0("The ",i," random point"))
     print(Points[i,])
     print("The following two results should be the same")
     print("Result from the function 'dmvnorm' out of package 'mvtnorm'")
     print(dmvnorm(Points[i,],mean=mu,sigma=sigma,log=FALSE))
     print("Result from equation out of wikipedia")
     SS <- sqrt(S)
     print(1/(2*pi*SS[1,1]*SS[2,2]*(1-rho^2)^(1/2))*exp((-1)/(2*(1-rho^2))*(Points[i,1]^2/SS[1,1]^2+Points[i,2]^2/SS[2,2]^2-(2*rho*Points[i,1]*Points[i,2])/(SS[1,1]*SS[2,2]))))
     print("----")
    print("----")
  } # end for-loop     
} # end function

# execute the function and compare the results
Comparison(Points=x,mean=mu,sigma=S,quantity=4)

因此，在定义sigma时，您的评论不正确。在您的代码中，sigma是差异，而不是标准偏差，如果您判断如何构建S。

两个正态分布的密度（pdf）的两个计算公式返回不同的结果

1 个答案: