如何计算一列内的份额

时间:2017-07-13 15:35:47

标签: r dataframe dplyr

我的数据框结构如下:

df <- structure(list(name1 =  c("A","A","B","B","A","A","B","B"), 
                 name2     =  c("B","B","C","C","ALL","ALL","ALL","ALL"),
                 pair_id   =  c(1,1,2,2,3,3,4,4),
                 year      =  c(2010, 2011, 2010, 2011, 2010, 2011,2010, 2011),
                 var1      =  c(1.5,2,4,5,12,15,20,18)), 
             .Names        =  c("name1","name2","pair_id","year", "var1"), 
            row.names      =  c("1", "2", "3", "4", "5", "6", "7", "8"), class =("data.frame"))

我想计算每年var1的百分比份额(分母为name2 = ALL)和pair_id。输出应如下所示:

df <- structure(list(name1   =  c("A","A","B","B","A","A","B","B"), 
                 name2       =  c("B","B","C","C","ALL","ALL","ALL","ALL"),
                 pair_id     =  c(1,1,2,2,3,3,4,4),
                 year        =  c(2010, 2011, 2010, 2011,2010,2011,2010,2011),
                 var1        =  c(1.5,2,4,5,12,15,18,20), 
                 var1_share  =  c(0.125,0.133333,0.2,0.2777,1,1,1,1)), 
            .Names           =  c("name1","name2","pair_id","year", "var1","var1_share"), 
            row.names        =  c("1", "2", "3", "4", "5", "6", "7", "8"), class =("data.frame"))

提前谢谢!

1 个答案:

答案 0 :(得分:1)

dplyr解决方案:

df %>%
  group_by(name1, year) %>%
  mutate(denom = var1[name2 == "ALL"]) %>%
  mutate(var1_share = var1/denom)
# # A tibble: 8 x 7
# # Groups:   name1, year [4]
#   name1 name2 pair_id  year  var1 denom var1_share
#   <chr> <chr>   <dbl> <dbl> <dbl> <dbl>      <dbl>
# 1     A     B       1  2010   1.5    12  0.1250000
# 2     A     B       1  2011   2.0    15  0.1333333
# 3     B     C       2  2010   4.0    20  0.2000000
# 4     B     C       2  2011   5.0    18  0.2777778
# 5     A   ALL       3  2010  12.0    12  1.0000000
# 6     A   ALL       3  2011  15.0    15  1.0000000
# 7     B   ALL       4  2010  20.0    20  1.0000000
# 8     B   ALL       4  2011  18.0    18  1.0000000