您好我有关于R的问题。
实际上我有200名员工,我知道整个人口(工作时间)的平均值和标准差。
以下内容必须重复400次:
1)收集人口中6人的小型随机样本。
2)构建平均值(μ)的90%置信区间(假设种群大小为无限)
3)在2)中构建的400个置信区间中,有多少不包含整个总体的平均值(μ)。
我收集了样本,但我无法建立置信区间。
这是我到目前为止所做的:
> population<-data$hours01
> n<-6
> Vect <- rep(0,400)
> for(i in 1:400){
+ ech <- sample(population,n)
+ right[i]<-(mean(ech)) + 1.645*(((sd(ech))/sqrt(n)))
+ left[i]<-(mean(ech)) - 1.645*(((sd(ech))/sqrt(n)))
以下是数据
heur01
1411
1734
1048
2060
1983
1810
1387
1637
1419
1637
1185
1766
1484
1983
1217
1915
1846
1887
1742
988
1375
1193
2056
1919
1850
2076
1463
1113
1887
1919
1734
1157
1766
1951
1923
2173
1609
1895
1109
1028
1701
1875
1677
1653
1883
1677
1850
1738
1520
1415
1992
1919
1653
1625
1705
1742
1891
2108
1919
1911
1770
1834
1911
2060
1717
1943
1859
1738
1222
1709
2052
1141
1931
2068
2044
1725
1818
1798
1943
1939
1919
1790
2116
1750
2052
1605
1798
2169
1665
1673
1185
1717
1717
1657
1915
1778
2121
1786
1774
2056
1738
1883
1754
1790
1770
1947
1867
1794
1867
1790
1762
2080
1778
1903
1734
1838
1560
1592
1637
1467
1750
1653
1222
1709
1806
1334
1584
2052
1802
1774
1770
1258
1334
1322
1826
1600
2189
1907
1548
1617
1693
1020
992
1435
1613
1738
1419
1121
1629
1605
1455
1157
1717
1294
1359
1282
1758
1395
1129
1189
1790
1217
1133
1516
1516
1278
1072
911
1286
968
1076
1315
1221
1268
939
1879
986
1221
1456
1315
1785
1080
1362
1503
1127
1691
1174
1644
1691
939
1503
1080
1503
1832
1362
1691
1456
1879
1644
1033
答案 0 :(得分:1)
您可以构建一个函数来计算置信区间,然后将其应用于replicate
的样本,以生成置信区间矩阵,您可以根据总体均值进行检查。
可能存在并发症:when standard deviation is unknown, confidence intervals are calculated with the t distribution, but if it is, the cumulative normal is used。如果自由度相对较大,则会产生很小的差异,但考虑到每个样本只有5个,这里的差异很重要。
因此,要为置信区间构建一个健壮的函数,你需要像
这样的东西ci <- function(x, conf.level, sd = NULL){
conf.level <- mean(c(conf.level, 1))
mean.x <- mean(x)
if (is.null(sd)) { # when standard deviation unknown,
sd <- sd(x) # use sample standard deviation
z <- qt(conf.level, length(x) - 1) # and t distribution
} else {
z <- qnorm(conf.level) # when known, use normal
}
int <- z * sd / sqrt(length(x))
c(low = mean.x - int,
high = mean.x + int)
}
尝试一下,
set.seed(47) # make sampling reproducible
# make a matrix of confidence intervals
ints <- replicate(400, ci(sample(heur01, 6), .9, sd(heur01)))
ints[, 1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> low 1443.959 1441.625 1376.459 1486.625 1436.959
#> high 1865.041 1862.708 1797.541 1907.708 1858.041
# calculate number of intervals that don't contain mean
mean.x <- mean(heur01)
sum(mean.x < ints[1,] | mean.x > ints[2,])
#> [1] 37
事实上,当没有指定标准偏差时,它确实是不同的,
set.seed(47)
with_sd <- replicate(100, {
ints <- replicate(400, ci(sample(heur01, 6), .9, sd(heur01)))
sum(mean.x < ints[1,] | mean.x > ints[2,])
})
summary(with_sd)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 27.0 34.0 37.0 37.5 41.0 50.0
set.seed(47)
no_sd <- replicate(100, {
ints <- replicate(400, ci(sample(heur01, 6), .9))
sum(mean.x < ints[1,] | mean.x > ints[2,])
})
summary(no_sd)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 29.00 43.00 46.00 47.07 52.00 66.00
t.test(with_sd, no_sd)
#>
#> Welch Two Sample t-test
#>
#> data: with_sd and no_sd
#> t = -11.472, df = 187.14, p-value < 2.2e-16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -11.215668 -7.924332
#> sample estimates:
#> mean of x mean of y
#> 37.50 47.07
数据
heur01 <- c(1411L, 1734L, 1048L, 2060L, 1983L, 1810L, 1387L, 1637L, 1419L, 1637L, 1185L, 1766L, 1484L, 1983L,
1217L, 1915L, 1846L, 1887L, 1742L, 988L, 1375L, 1193L, 2056L, 1919L, 1850L, 2076L, 1463L, 1113L, 1887L,
1919L, 1734L, 1157L, 1766L, 1951L, 1923L, 2173L, 1609L, 1895L, 1109L, 1028L, 1701L, 1875L, 1677L, 1653L,
1883L, 1677L, 1850L, 1738L, 1520L, 1415L, 1992L, 1919L, 1653L, 1625L, 1705L, 1742L, 1891L, 2108L, 1919L,
1911L, 1770L, 1834L, 1911L, 2060L, 1717L, 1943L, 1859L, 1738L, 1222L, 1709L, 2052L, 1141L, 1931L, 2068L,
2044L, 1725L, 1818L, 1798L, 1943L, 1939L, 1919L, 1790L, 2116L, 1750L, 2052L, 1605L, 1798L, 2169L, 1665L,
1673L, 1185L, 1717L, 1717L, 1657L, 1915L, 1778L, 2121L, 1786L, 1774L, 2056L, 1738L, 1883L, 1754L, 1790L,
1770L, 1947L, 1867L, 1794L, 1867L, 1790L, 1762L, 2080L, 1778L, 1903L, 1734L, 1838L, 1560L, 1592L, 1637L,
1467L, 1750L, 1653L, 1222L, 1709L, 1806L, 1334L, 1584L, 2052L, 1802L, 1774L, 1770L, 1258L, 1334L, 1322L,
1826L, 1600L, 2189L, 1907L, 1548L, 1617L, 1693L, 1020L, 992L, 1435L, 1613L, 1738L, 1419L, 1121L, 1629L,
1605L, 1455L, 1157L, 1717L, 1294L, 1359L, 1282L, 1758L, 1395L, 1129L, 1189L, 1790L, 1217L, 1133L, 1516L,
1516L, 1278L, 1072L, 911L, 1286L, 968L, 1076L, 1315L, 1221L, 1268L, 939L, 1879L, 986L, 1221L, 1456L,
1315L, 1785L, 1080L, 1362L, 1503L, 1127L, 1691L, 1174L, 1644L, 1691L, 939L, 1503L, 1080L, 1503L, 1832L,
1362L, 1691L, 1456L, 1879L, 1644L, 1033L)