我一直在尝试使用以下代码评估均值估算差异的误报率。
sm_n <- 10
dat <- data.frame(b=rep(c("a","b"),c(4,6)),
x = c(0,1,0,0,0,1,0,0,1,1),
nb=rep(c(4,6),c(4,6)))
dat$y0 <- 1000*dat$x+c(0,0,0,0,1,2,3,400,5000,60000)
## summary(lm(y0~x,data=dat))$r.squared
## dat$Z <- ifelse(dat$b=="a",complete_ra(N=4), complete_ra(N=6,m=2))
set.seed(12345)
dat$Z <- block_ra(blocks = dat$b, block_m = c(2,2))
dat$tau <- c(2500,0,2500,0,25000,25000,0,0,0,50000)
dat$y1 <- dat$y0 + dat$tau
trueATE <- with(dat,mean(y1-y0))
trueRankATE <- with(dat,mean(rank(y1)-rank(y0)))
dat$Y <- with(dat,Z*y1 + (1-Z)*y0)
dat$ZF <- factor(dat$Z)
dat %<>% group_by(b) %>% mutate(pi=mean(Z), #prob treated
nbwt=Z/pi + (1-Z)/(1-pi))
估计本身可以很好地工作,生成Z
与其他任何事物之间没有系统关系的新随机分配的函数也可以正常工作。但是,一旦尝试运行该函数以提取p-values
,就会出现以下错误:
match(x,table,nomatch = 0L)中的错误: “匹配”需要向量参数
问题似乎在difference_in_means(Y~newexp(b), blocks=b,data=dat)
est3 <- difference_in_means(Y~Z,blocks=b,data=dat)
summary(est3)
newexp <- function(b){
## A new random assignment with no systematic relationship between Z and anything else
Z <- block_ra(blocks = dat$b, block_m = c(2,2))
return(Z)
}
get_p_val_from_difference_in_means<-function(){
## First, shuffle to break the relationship, to make the truth zero
## Then, test
theest <- difference_in_means(Y~newexp(b), blocks=b,data=dat)
thep <- coef(summary(theest))[1,4]
thep
}
est3ps <- replicate(1000, get_p_val_from_difference_in_means())
mean(est3ps < .05)
plot(ecdf(est3ps ), ylim = c(0,1), xlim = c(0,1))
abline(0,1)