使用双变量椭圆查找覆盖范围

时间:2013-08-26 22:53:07

标签: r statistics

我正在努力解决估算问题。在几个例子中,我已经展示了如何使用由二元正态分布生成的点矢量来计算二元椭圆。代码工作正常,除了覆盖(生成的或真实的ps(p1,p2)包含在估计的椭圆中的次数)我得到的似乎非常低。我还应该声明,旧版本的R与新版本相比给出了截然不同的结果。我现在正在使用R 3.0.1。这是能够重现问题的代码。

  library(MASS)
  set.seed(1234)
  x1<-NULL
  x2<-NULL
  k<-1
  Sigma2 <- matrix(c(.72,.57,.57,.46),2,2)
  Sigma2
  rho <- Sigma2[1,2]/sqrt(Sigma2[1,1]*Sigma2[2,2])
  eta<-replicate(300,mvrnorm(k, mu=c(-1.01,-2.39), Sigma2)) 
  p1<-exp(eta)/(1+exp(eta)) # true p's
  n<-60
  x1<-replicate(300,rbinom(k,n,p1[1,])) 
  x2<-replicate(300,rbinom(k,n,p1[2,]))

  rate1<-x1/60  # Estimated p's
  rate2<-x2/60
  library(car)
  ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95))
  library(sp)
  within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
  mean(within)    # coverage

1 个答案:

答案 0 :(得分:3)

错误在于:

x1<-replicate(300,rbinom(k,n,p1[1,])) 
x2<-replicate(300,rbinom(k,n,p1[2,]))

由于k=1,调用rbinom(k,n,p1[1,])生成单个随机偏差,并且仅使用p1[1,]中的第一个概率。您正在复制此呼叫300次,因此每个偏差使用相同的概率。因此,rate1rate2占用的参数空间比p1小得多。通过在数据椭圆上绘制p1来可视化:

x1<-replicate(300,rbinom(k,n,p1[1,])) 
x2<-replicate(300,rbinom(k,n,p1[2,]))

rate1<-x1/60  # Estimated p's
rate2<-x2/60
library(car)
plot.new()
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA)
library(sp)
within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
mean(within)  

plot(p1[1,which(within==1)], p1[2,which(within==1)], col="blue", ylim=c(0,1),xlim=c(0,1))
points(p1[1,which(within==0)], p1[2,which(within==0)], col="green")

ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA, add=T)

正确的代码给出了适当的覆盖率(约95%):

x1<-rbinom(300,n,p1[1,])
x2<-rbinom(300,n,p1[2,])
rate1<-x1/60  # Estimated p's
rate2<-x2/60
library(car)
plot.new()
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA)
library(sp)
within<-point.in.polygon(p1[1,], p1[2,], ell$`0.95`[,1], ell$`0.95`[,2])
mean(within)  

plot(p1[1,which(within==1)], p1[2,which(within==1)], col="blue", ylim=c(0,1),xlim=c(0,1))
points(p1[1,which(within==0)], p1[2,which(within==0)], col="green")
ell <- dataEllipse(rate1, rate2, levels=c(0.05, 0.95), plot.points=T, pch=NA, add=T)