我正在使用仅包含一列“总计”的数据集“文件”的统计中值执行引导。就是这样:
Total <-
c(2089, 1567, 1336, 1616, 1590, 1649, 1341, 1614, 1590, 1621,
1621, 1631, 1295, 107, 18, 195, 2059, 870, 2371, 787, 98, 2422,
655, 1277, 1336, 2109, 1811, 1337, 1290, 1308, 1359, 1600, 1296,
693, 107, 1359, 89, 89, 89, 89, 2411, 1639, 89, 89, 1283, 89,
89, 89, 2341, 1012, 1295, 1853, 1277, 1571, 1288, 1300, 1619,
107, 555, 1612, 1300, 1300, 2093, 133, 1674, 988, 132, 647, 606,
544, 873, 274, 120, 1620, 1601, 1601, 906, 1603, 1613, 1592,
1603, 1610, 1321, 2380, 1575, 1575, 1277, 2354, 1561, 1579, 2367,
2341, 876, 1612, 1588, 2087, 1612, 890, 1586, 1580, 611, 1797,
2079, 1937, 189, 171, 706, 1647, 1642, 1278, 1650, 1623, 1647,
1661, 1692, 1632, 1684, 2474, 403, 842, 593, 98, 2354, 1265,
866, 1483, 2379, 1650, 1875, 1655, 1632, 1691, 1329, 867, 1632,
1693, 1623, 829, 1659, 1685, 666, 1585, 1659, 2169, 1623, 1645,
1654, 1698, 2172, 789, 1698, 579, 2443, 335, 132, 1952, 1265,
978, 1624, 979, 1729, 607, 181, 752, 424, 386, 309, 998, 1435,
2476, 392, 1657, 348, 1652, 1646, 1345, 2445, 1655, 840, 1624,
1652, 1321, 1321, 2201, 957, 917, 2458, 4096, 2458, 1346, 2459,
1634, 2459, 2459, 2459, 2508, 714, 2457, 2457, 1703, 669, 976,
1634, 2459, 2491, 2393, 625, 1763, 879, 886, 1085, 731, 924,
1649, 1216, 1647, 2470, 668, 2326, 757, 215, 276, 186, 901, 1402,
429, 554, 2457, 1643, 986, 730, 1028, 971, 1952, 1584, 1023,
1352, 839, 2434, 430, 2462, 1327, 1004, 385, 1099, 1067, 758,
679, 1423, 2495, 1664, 2495, 2495, 1345, 2530, 1754, 1804, 2525,
1652, 2536, 1646, 2529, 1380, 1845, 963, 1339, 2482, 1417, 1729,
1384, 1648, 344, 1648, 955, 609, 485, 1822, 513, 223, 222, 193,
1410, 1159, 586, 585, 2671, 2702, 2529, 2212, 1658, 741, 2529,
861, 1758, 905, 2529, 597, 1049, 2529, 619, 2620, 2596, 1688,
2590, 2545, 2590, 883, 287, 723, 2565, 1835, 1738, 2243, 1693,
2565, 250, 2529, 1880, 1777, 701, 444, 927, 1127, 825, 2726,
1977, 235, 241, 269, 660, 1523, 420, 678, 213, 544, 940, 983,
605, 2716, 1848, 1848, 182, 1225, 365, 993, 224, 267, 309, 271,
324, 178, 2657, 1772, 546, 456, 2637, 1771, 677, 1409, 653, 2359,
690, 828, 2742, 1812, 2777, 552, 1572, 2742, 2792, 2819, 1753,
265, 1901, 1753, 2716, 2800, 2742, 453, 2742, 586, 1920, 929,
1897, 2742, 1859, 1899, 1106, 1135, 759, 730, 1838, 863, 1929,
2751, 2751, 2751, 2751, 713, 430, 2788, 1784, 966, 2483, 1784,
1786, 2727, 857, 1798, 1815, 730, 390, 593, 1489, 1448, 1784,
1510, 2788, 812, 856, 808, 941, 2797, 2757, 1852, 2757, 2412,
486, 1034, 615, 845, 974, 727, 969, 2916, 1841, 1926, 1926, 533,
446, 733, 696, 1214, 1857, 1907, 2824, 2631, 3556, 2496, 1617,
1000, 707, 936, 761, 960, 1936, 857, 423, 1130, 1165, 2453, 338,
988, 1869, 1951, 1932, 2820, 2742, 628, 447, 866, 637, 932, 2742,
1795, 2881, 695, 762, 2778, 427, 714, 2781, 1865, 1861, 678,
1465, 1770, 845, 356, 817, 385, 1820, 2692, 1787, 1510, 1814,
857, 2616, 204, 465, 1773, 2754, 1793, 1773, 1900, 185, 2706,
1162, 766, 2742, 1816, 2742, 1790, 1803, 1795, 1026, 334, 832,
478, 1849, 2679, 1773, 797, 2649, 1814, 1808, 99, 2037, 2616,
2719, 1813, 2637, 2648, 1813, 865, 1717, 2588, 2711, 2818, 1828,
2553, 2720, 1791, 1780, 2706, 2565, 1717, 1881, 1037, 329, 893,
723, 1821, 2692, 2586, 2729, 1755, 1793, 2670, 2602, 2638, 2684,
1813, 1755, 1755, 2626, 832, 739, 724, 1968, 2598, 2627, 851,
749, 684, 625, 2673, 2778, 1764, 2644, 1800, 1792, 511, 2776,
1890, 1764, 2776, 1040, 1049, 2699, 2061, 897, 1764, 274, 2755,
1912, 2581, 1780, 820, 1803, 2692, 2783, 572, 2751, 2699, 1830,
1875, 633, 1083)
然后我尝试使用bootstrap函数:
> boot (Total, median, 1000)
普通非参数引导
致电: 引导(数据=总数,统计=中位数,R = 1000)
引导程序统计信息: 原始偏差标准错误 t1 * 1603 0 0 有50个或更多警告(请使用warnings()查看前50个警告)
警告消息是: 条件的长度> 1,并且只会使用第一个元素
您能告诉我如何执行自举以生成中值的95%置信区间吗?我是一个初学者,非常感谢您的帮助。
非常感谢您。
答案 0 :(得分:2)
启动程序包中的boot
函数在功能上似乎有点不直观。但是,如果您阅读文档(或查看文档中的示例),则会看到有关statistic
参数的具体说明:
在所有其他情况下,统计信息必须至少包含两个参数。的 传递的第一个参数将始终是原始数据。第二 将是索引,频率或权重的向量,这些向量定义了 引导程序样本。
所以代替:
x <- rnorm(10)
boot(data = x,statistic = median,R = 1000)
您想要这个:
boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
到此为止,函数boot.ci()
可用于计算置信区间(我相信在此特定示例中只有其中一些可用)。
b <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
boot.ci(b)
答案 1 :(得分:0)
尽管@joran的答案是正确的,但由于我已经使用CI计算对代码进行了测试,所以就可以了。
document.createElement()
答案 2 :(得分:0)
这是您“制作自己的” bootrap的方法:
# number of bootstrap replicates
B <- 10000
# create empty storage container
result_vec <- vector(length=B)
for(b in 1:B) {
# draw a bootstrap sample
this_sample <- sample(Total, size=length(Total), replace=TRUE)
# calculate your statistic
m <- median(this_sample)
# save your calucated statistic
result_vec[b] <- m
}
# then probably draw a histogram of your bootstrapped replicates
hist(result_vec)
# get 95% confidence interval
result_vec <- result_vec[order(result_vec)]
lower_bound <- result_vec[round(0.025*B)]
upper_bound <- result_vec[round(0.0975*B)]
答案 3 :(得分:0)
我在此代码中使用标准的普通随机生成器:
B <- i
bs.result <- matrix(NA, nrow=i, ncol=...)
for (b in 1:i) {
sample.n <- rnorm(n, mean-..., sd=...)
optim.b <- optim(c(mu=0, sd=1), loglik, control=list(fnscale=-1), z=sample.n)
bs.result <- c(optim.b$par, optim.b$converge)
}
在表的最后一列,您可以检查优化函数是否收敛。