我在调查中计算SE时遇到一些问题。这是我想要做的事例,我试图在R中使用调查包。 (以下示例中的fpc等于每个阶层中的观察数量)
id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
strata = c(6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8, 8)
weight = c(60, 75, 85, 140, 170, 175, 270, 310, 325, 785, 1450, 3920)
fpc = c(8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6)
answer = c("2", "2", "3", "1", "2", NA, NA, 2, "3", NA, "1", NA)
df = data.frame(id, strata, weight, fpc, answer)
df <- df[complete.cases(df), ]
然后我尝试使用调查包计算平均值和SE:
dstrat<-svydesign(id=~1,strata=~strata, weights=~weight, data=df, fpc=~fpc)
svymean(~answer, dstrat)
mean SE
answer1 0.60803 0.2573
answer2 0.23518 0.1755
answer3 0.15679 0.1479
我的第一个问题是:我如何考虑在我的研究中没有回答的观察的权重?在上面的例子中,我在运行函数之前删除了我的NA观察,但我希望包含这些信息。我假设SE会更大或更小,具体取决于我是否有最大权重的观察答案?
我的第二个问题是:如何计算净值&#34;? 假设:
answer1 = good
answer2 = neutral
answer3 = bad
我可以计算&#34;净值&#34;作为答案1 - 答案3 = 0.60803 - 0.15679 = 0.45124。 我怎样才能获得SE&#34;净值&#34;?
答案 0 :(得分:3)
您的第一个问题属于stats.stackexchange - 但我认为答案是您在数据丢失时无法计算SE。但是这里是如何解决第二个问题的SE:
library(survey)
id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
strata <- c(6, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8, 8)
weight <- c(60, 75, 85, 140, 170, 175, 270, 310, 325, 785, 1450, 3920)
fpc <- c(8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 6)
answer <- c("2", "2", "3", "1", "2", NA, NA, 2, "3", NA, "1", NA)
df <- data.frame(id=id, strata=strata, weight=weight, fpc=fpc, answer=answer)
# this is probably a mistake
df <- df[complete.cases(df), ]
# in most data sets, you should be using na.rm=TRUE later
# and not tossing out statements before the `svydesign` gets run
df$ones <- as.numeric( df$answer %in% 1 )
df$threes <- as.numeric( df$answer %in% 3 )
dstrat<-svydesign(id=~1,strata=~strata, weights=~weight, data=df, fpc=~fpc)
a <- svymean( ~ ones + threes , dstrat , na.rm = TRUE )
svycontrast(a, list(avg=c(0,0), diff=c(1,-1)))