在R中使用模拟测试置信区间的覆盖概率

时间:2019-01-16 01:11:04

标签: r confidence-interval

我有一个家庭作业问题,要求我使用R中的模拟测试置信区间(作为上一个问题的一部分发现)的覆盖概率。

我的代码正尝试从我拥有的样本数据中生成1000个随机样本(有替换样本),有效地将原始样本视为新种群。随机样本的大小与我的总体相同。然后,我想找到每个随机样本的95%置信区间,看看有多少样本包含“真实平均值”(在问题陈述中给出)与“人口平均值”(原始样本的平均值)。

set.seed(1987)

iq <- rnorm(1000,91.08065,14.40393)

pop_mean <- mean(iq) #the mean of my sample is now considered the population mean
true_mean <- 100 #the true mean is 100, specified in question

sampSEs <- numeric() #create an empty vector to put the sample SEs in
sampMeans <- numeric() #create an empty vector to put the sample means in

get_conf_interval <- function(sample_measurements) {
  iqSE_samp <- 15/sqrt(length(iq)) #find the SE using an sd of 15
  iqMean_samp <- mean(sample_measurements) #take the mean of each sample
  upper <- iqMean_samp + 1.96*iqSE_samp #find the upper bound for a 95% CI
  lower <- iqMean_samp - 1.96*iqSE_samp #find the lower bound for a 95% CI
  list(lower=lower, upper=upper)
}

interval_contains_true_mean <- function(interval) { #check if the interval contains the true mean
  true_mean >= interval$lower && true_mean <= interval$upper
}
interval_contains_population_mean <- function(interval) { #check if the interval contains the population mean
  pop_mean >= interval$lower && pop_mean <= interval$upper
}

samples <- replicate(1000, sample(iq, size = 124, replace = T)) #take 1000 samples with replacement from my iq data

for(i in 1:1000) { #for each sample taken
  sampMeans[i] <- mean(samples[i]) #put the mean of it in the vector created previously
  sampSEs[i] <- 15/sqrt(length(iq)) #put the SE in a vector... these are all the same bc not finding the sample sd
}

intervals <- apply(samples, FUN=get_conf_interval, MARGIN=2) #call the function to find the confidence intervals

sampMeans #just check if worked
#sampSEs #ditto

percent_intervals_with_true_mean <- mean(sapply(intervals, FUN=interval_contains_true_mean)) * 100
cat("% Intervals Containing True Mean: ", percent_intervals_with_true_mean, "%\n")

percent_intervals_with_pop_mean <- mean(sapply(intervals, FUN=interval_contains_population_mean)) * 100
cat("% Intervals Containing Population Mean: ", percent_intervals_with_pop_mean, "%")

此代码报告我样本的0%置信区间包含真实均值。这是不正确的。我查看了样本均值,其中一些是真实均值。

1 个答案:

答案 0 :(得分:0)

1.-我有两个解决方案,第一个解决方案是在'mean(samples [,i])'和

中放入逗号

'set.seed(1987)

sigma_M = 14.40393

mu_M = 91.08065

m = 10

iq <-rnorm(m,mu_M,sigma_M)

pop_mean <-mean(iq)#我的样本均值现在视为总体均值

样本<-复制(m,样本(iq,大小= 4,替换= T))#从我的iq数据中提取m个样本并进行替换

sampSEs <-numeric()#创建一个空向量以将示例SE放入

sampMeans <-numeric()#创建一个空向量,将样本均值放入

for(i in 1:m){#每个取样样本

sampMeans [i] <-mean(samples [,i])#将其平均值放入先前创建的向量中

sampSEs [i] <-15 / sqrt(length(iq))#将SE放入向量中...这些都是相同的bc,没有找到样本sd }

get_conf_interval <-函数(sample_measurements){

iqSE_samp <-15 / sqrt(length(iq))#使用15的sd查找SE

iqMean_samp <-mean(sample_measurements)#取每个样本的平均值

上部<-iqMean_samp + 1.96 * iqSE_samp#找到95%CI的上限

lower <-iqMean_samp-1.96 * iqSE_samp#找到95%CI的下限

列表(下=下,上=上) }

interval_contains_population_mean <-function(interval){#检查间隔是否包含总体平均值

pop_mean> = interval $ lower && pop_mean <= interval $ upper }

间隔<-apply(samples,FUN = get_conf_interval,MARGIN = 2)#调用函数以查找置信区间

sampMeans#只是检查是否有效

sampSEs #ditto

percent_intervals_with_pop_mean <-平均值(sapply(intervals,FUN = interval_contains_population_mean))* 100

cat(“%包含人口的区间平均值:”,percent_intervals_with_pop_mean,“%”)'

2.-第二个解决方案是更改代码,但是我只做了'pop_mean'人口平均 (并计算标准偏差)

'set.seed(1987)

sigma_M = 14.40393

mu_M = 91.08065

m = 10

iq <-rnorm(m,mu_M,sigma_M)

pop_mean <-mean(iq)#我的样本均值现在视为总体均值

样本<-复制(m,样本(iq,大小= 4,替换= T))#从我的iq数据中提取m个样本并进行替换

sampMeans = apply(样本,2,均值)

iqSE_samp <-15 / sqrt(length(iq))#使用15的sd查找SE

iqMean_samp <-sampMeans#取每个样本的平均值

上部<-iqMean_samp + 1.96 * iqSE_samp#找到95%CI的上限

lower <-iqMean_samp-1.96 * iqSE_samp#找到95%CI的下限

intervals = cbind(下,上)

percent_intervals_with_pop_mean =平均值(apply(intervals,1,findInterval,x = pop_mean)== 1)* 100

cat(“%包含人口的区间平均值:”,percent_intervals_with_pop_mean,“%”)'

决赛对我来说是80%