计算组的百分比

时间:2019-06-13 09:46:12

标签: r

我有一组人口数字。我已将数据分为年龄段和地区。

如何使用以下示例数据计算所有列中每个组和每个区域内的比例?

area    sex  agegrouping  2011         2012         2013
area1   F    0-4          637.4815661  626.6145185  596.7128164
area1   F    10-14        417.8041418  402.5041888  411.2180838
area1   F    15-19        360.6491372  361.5883403  364.5626384
area1   F    20-24        562.4887445  598.7190796  617.9790937
area1   M    0-4          581.08247    581.11732    556.4439468
area1   M    10-14        408.1015966  379.945334   377.7312704
area1   M    15-19        380.7336397  392.2732017  384.8757803
area1   M    20-24        1089.024655  983.1813181  874.3646633
area2   F    0-4          460.2959017  479.7512631  489.1076221
area2   F    10-14        357.2974721  378.9785589  410.7145251
area2   F    15-19        353.4763328  324.3975914  312.5421936
area2   F    20-24        674.8157905  627.0151556  568.8309423
area2   M    0-4          570.1424505  579.4558621  572.8858648
area2   M    10-14        366.9484728  365.0947588  370.726409
area2   M    15-19        382.3444468  365.0342791  343.5104
area2   M    20-24        645.3627281  624.4575313  577.5540519

我知道我可以逐列手动进行操作,但是有一种方法可以一次完成所有操作(因为完整的数据集到2050年为止)。

数据应如下所示(但包括所有其他年份列和区域):

area   sex  agegrouping  2011.percent
area1  F    0-4          14.36621575
area1  F    10-14        9.415589032
area1  F    15-19        8.127550019
area1  F    20-24        12.67618562
area1  M    0-4          13.09521181
area1  M    10-14        9.196933521
area1  M    15-19        8.5801722
area1  M    20-24        24.54214205

2 个答案:

答案 0 :(得分:0)

我们可以使用ave()来计算分组和,并使用它们来计算百分比。

sums <- ave(tt[-(1:3)], tt[,1], FUN=function(x) lapply(x, sum))

tt[-(1:3)] <- round(100*tt[-(1:3)]/sums, 3)
tt
#     area sex agegrouping  X2011  X2012  X2013
# 1  area1   F         0-4 14.367 14.485 14.262
# 2  area1   F       10-14  9.416  9.304  9.828
# 3  area1   F       15-19  8.127  8.359  8.714
# 4  area1   F       20-24 12.677 13.840 14.771
# 5  area1   M         0-4 13.096 13.433 13.299
# 6  area1   M       10-14  9.197  8.782  9.027
# 7  area1   M       15-19  8.580  9.069  9.200
# 8  area1   M       20-24 24.542 22.728 20.899
# 9  area2   F         0-4 12.079 12.814 13.415
# 10 area2   F       10-14  9.376 10.122 11.265
# 11 area2   F       15-19  9.277  8.664  8.572
# 12 area2   F       20-24 17.708 16.745 15.602
# 13 area2   M         0-4 14.961 15.477 15.714
# 14 area2   M       10-14  9.628  9.751 10.168
# 15 area2   M       15-19 10.033  9.748  9.422
# 16 area2   M       20-24 16.937 16.679 15.843

数据:

tt <- structure(list(area=structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
  1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label=c("area1",
  "area2"), class="factor"), sex=structure(c(1L, 1L, 1L, 1L, 2L,
  2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label=c("F",
  "M"), class="factor"), agegrouping=structure(c(1L, 2L, 3L, 4L,
  1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
  .Label=c("0-4", "10-14", "15-19", "20-24"), class="factor"),
  X2011=c(637.5, 417.8, 360.6, 562.5, 581.1, 408.1, 380.7, 1089,
  460.3, 357.3, 353.5, 674.8, 570.1, 366.9, 382.3, 645.4),
  X2012=c(626.6, 402.5, 361.6, 598.7, 581.1, 379.9, 392.3, 983.2,
  479.8, 379, 324.4, 627, 579.5, 365.1, 365, 624.5),
  X2013=c(596.7, 411.2, 364.6, 618, 556.4, 377.7, 384.9, 874.4,
  489.1, 410.7, 312.5, 568.8, 572.9, 370.7, 343.5, 577.6)),
  row.names=c(NA, -16L), class="data.frame")

答案 1 :(得分:0)

这是dplyr版本:

library(dplyr)

dt = read.table(text = "
area    sex  agegrouping  2011         2012         2013
area1   F    0-4          637.4815661  626.6145185  596.7128164
area1   F    10-14        417.8041418  402.5041888  411.2180838
area1   F    15-19        360.6491372  361.5883403  364.5626384
area1   F    20-24        562.4887445  598.7190796  617.9790937
area1   M    0-4          581.08247    581.11732    556.4439468
area1   M    10-14        408.1015966  379.945334   377.7312704
area1   M    15-19        380.7336397  392.2732017  384.8757803
area1   M    20-24        1089.024655  983.1813181  874.3646633
area2   F    0-4          460.2959017  479.7512631  489.1076221
area2   F    10-14        357.2974721  378.9785589  410.7145251
area2   F    15-19        353.4763328  324.3975914  312.5421936
area2   F    20-24        674.8157905  627.0151556  568.8309423
area2   M    0-4          570.1424505  579.4558621  572.8858648
area2   M    10-14        366.9484728  365.0947588  370.726409
area2   M    15-19        382.3444468  365.0342791  343.5104
area2   M    20-24        645.3627281  624.4575313  577.5540519
", header=T)


dt %>%
  group_by(area) %>%                                 # for each area
  mutate_if(is.numeric, ~./sum(.)) %>%               # calculate percentages for each numeric column
  rename_if(is.numeric, ~gsub("X", "prc_", .)) %>%   # update the names of those columns
  ungroup()                                          # forget the grouping

# # A tibble: 16 x 6
#    area  sex   agegrouping prc_2011 prc_2012 prc_2013
#   <fct> <fct> <fct>          <dbl>    <dbl>    <dbl>
# 1 area1 F     0-4           0.144    0.145    0.143 
# 2 area1 F     10-14         0.0942   0.0930   0.0983
# 3 area1 F     15-19         0.0813   0.0836   0.0871
# 4 area1 F     20-24         0.127    0.138    0.148 
# 5 area1 M     0-4           0.131    0.134    0.133 
# 6 area1 M     10-14         0.0920   0.0878   0.0903
# 7 area1 M     15-19         0.0858   0.0907   0.0920
# 8 area1 M     20-24         0.245    0.227    0.209 
# 9 area2 F     0-4           0.121    0.128    0.134 
# 10 area2 F     10-14         0.0938   0.101    0.113 
# 11 area2 F     15-19         0.0928   0.0866   0.0857
# 12 area2 F     20-24         0.177    0.167    0.156 
# 13 area2 M     0-4           0.150    0.155    0.157 
# 14 area2 M     10-14         0.0963   0.0975   0.102 
# 15 area2 M     15-19         0.100    0.0975   0.0942
# 16 area2 M     20-24         0.169    0.167    0.158