我有一组人口数字。我已将数据分为年龄段和地区。
如何使用以下示例数据计算所有列中每个组和每个区域内的比例?
area sex agegrouping 2011 2012 2013
area1 F 0-4 637.4815661 626.6145185 596.7128164
area1 F 10-14 417.8041418 402.5041888 411.2180838
area1 F 15-19 360.6491372 361.5883403 364.5626384
area1 F 20-24 562.4887445 598.7190796 617.9790937
area1 M 0-4 581.08247 581.11732 556.4439468
area1 M 10-14 408.1015966 379.945334 377.7312704
area1 M 15-19 380.7336397 392.2732017 384.8757803
area1 M 20-24 1089.024655 983.1813181 874.3646633
area2 F 0-4 460.2959017 479.7512631 489.1076221
area2 F 10-14 357.2974721 378.9785589 410.7145251
area2 F 15-19 353.4763328 324.3975914 312.5421936
area2 F 20-24 674.8157905 627.0151556 568.8309423
area2 M 0-4 570.1424505 579.4558621 572.8858648
area2 M 10-14 366.9484728 365.0947588 370.726409
area2 M 15-19 382.3444468 365.0342791 343.5104
area2 M 20-24 645.3627281 624.4575313 577.5540519
我知道我可以逐列手动进行操作,但是有一种方法可以一次完成所有操作(因为完整的数据集到2050年为止)。
数据应如下所示(但包括所有其他年份列和区域):
area sex agegrouping 2011.percent
area1 F 0-4 14.36621575
area1 F 10-14 9.415589032
area1 F 15-19 8.127550019
area1 F 20-24 12.67618562
area1 M 0-4 13.09521181
area1 M 10-14 9.196933521
area1 M 15-19 8.5801722
area1 M 20-24 24.54214205
答案 0 :(得分:0)
我们可以使用ave()
来计算分组和,并使用它们来计算百分比。
sums <- ave(tt[-(1:3)], tt[,1], FUN=function(x) lapply(x, sum))
tt[-(1:3)] <- round(100*tt[-(1:3)]/sums, 3)
tt
# area sex agegrouping X2011 X2012 X2013
# 1 area1 F 0-4 14.367 14.485 14.262
# 2 area1 F 10-14 9.416 9.304 9.828
# 3 area1 F 15-19 8.127 8.359 8.714
# 4 area1 F 20-24 12.677 13.840 14.771
# 5 area1 M 0-4 13.096 13.433 13.299
# 6 area1 M 10-14 9.197 8.782 9.027
# 7 area1 M 15-19 8.580 9.069 9.200
# 8 area1 M 20-24 24.542 22.728 20.899
# 9 area2 F 0-4 12.079 12.814 13.415
# 10 area2 F 10-14 9.376 10.122 11.265
# 11 area2 F 15-19 9.277 8.664 8.572
# 12 area2 F 20-24 17.708 16.745 15.602
# 13 area2 M 0-4 14.961 15.477 15.714
# 14 area2 M 10-14 9.628 9.751 10.168
# 15 area2 M 15-19 10.033 9.748 9.422
# 16 area2 M 20-24 16.937 16.679 15.843
数据:
tt <- structure(list(area=structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label=c("area1",
"area2"), class="factor"), sex=structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label=c("F",
"M"), class="factor"), agegrouping=structure(c(1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
.Label=c("0-4", "10-14", "15-19", "20-24"), class="factor"),
X2011=c(637.5, 417.8, 360.6, 562.5, 581.1, 408.1, 380.7, 1089,
460.3, 357.3, 353.5, 674.8, 570.1, 366.9, 382.3, 645.4),
X2012=c(626.6, 402.5, 361.6, 598.7, 581.1, 379.9, 392.3, 983.2,
479.8, 379, 324.4, 627, 579.5, 365.1, 365, 624.5),
X2013=c(596.7, 411.2, 364.6, 618, 556.4, 377.7, 384.9, 874.4,
489.1, 410.7, 312.5, 568.8, 572.9, 370.7, 343.5, 577.6)),
row.names=c(NA, -16L), class="data.frame")
答案 1 :(得分:0)
这是dplyr
版本:
library(dplyr)
dt = read.table(text = "
area sex agegrouping 2011 2012 2013
area1 F 0-4 637.4815661 626.6145185 596.7128164
area1 F 10-14 417.8041418 402.5041888 411.2180838
area1 F 15-19 360.6491372 361.5883403 364.5626384
area1 F 20-24 562.4887445 598.7190796 617.9790937
area1 M 0-4 581.08247 581.11732 556.4439468
area1 M 10-14 408.1015966 379.945334 377.7312704
area1 M 15-19 380.7336397 392.2732017 384.8757803
area1 M 20-24 1089.024655 983.1813181 874.3646633
area2 F 0-4 460.2959017 479.7512631 489.1076221
area2 F 10-14 357.2974721 378.9785589 410.7145251
area2 F 15-19 353.4763328 324.3975914 312.5421936
area2 F 20-24 674.8157905 627.0151556 568.8309423
area2 M 0-4 570.1424505 579.4558621 572.8858648
area2 M 10-14 366.9484728 365.0947588 370.726409
area2 M 15-19 382.3444468 365.0342791 343.5104
area2 M 20-24 645.3627281 624.4575313 577.5540519
", header=T)
dt %>%
group_by(area) %>% # for each area
mutate_if(is.numeric, ~./sum(.)) %>% # calculate percentages for each numeric column
rename_if(is.numeric, ~gsub("X", "prc_", .)) %>% # update the names of those columns
ungroup() # forget the grouping
# # A tibble: 16 x 6
# area sex agegrouping prc_2011 prc_2012 prc_2013
# <fct> <fct> <fct> <dbl> <dbl> <dbl>
# 1 area1 F 0-4 0.144 0.145 0.143
# 2 area1 F 10-14 0.0942 0.0930 0.0983
# 3 area1 F 15-19 0.0813 0.0836 0.0871
# 4 area1 F 20-24 0.127 0.138 0.148
# 5 area1 M 0-4 0.131 0.134 0.133
# 6 area1 M 10-14 0.0920 0.0878 0.0903
# 7 area1 M 15-19 0.0858 0.0907 0.0920
# 8 area1 M 20-24 0.245 0.227 0.209
# 9 area2 F 0-4 0.121 0.128 0.134
# 10 area2 F 10-14 0.0938 0.101 0.113
# 11 area2 F 15-19 0.0928 0.0866 0.0857
# 12 area2 F 20-24 0.177 0.167 0.156
# 13 area2 M 0-4 0.150 0.155 0.157
# 14 area2 M 10-14 0.0963 0.0975 0.102
# 15 area2 M 15-19 0.100 0.0975 0.0942
# 16 area2 M 20-24 0.169 0.167 0.158