for loop和dplyr包

时间:2018-02-08 18:49:36

标签: r for-loop dplyr

我的数据如下:

  died   pre_died zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13
1    1 0.03070181      1      0      0      0      0      0      0      0      0       0       0       0       0
2    0 0.13301922      1      0      0      0      0      0      0      0      0       0       0       0       0
3    1 0.87192980      1      0      0      0      0      0      0      0      0       0       0       0       0
4    0 0.01805484      1      0      0      0      0      0      0      0      0       0       0       0       0
5    0 0.02586771      1      0      0      0      0      0      0      0      0       0       0       0       0
6    0 0.02476175      1      0      0      0      0      0      0      0      0       0       0       0       0

我想将zipid1到zipid30的每个zipid的死亡变量相加。我目前的代码是这样的

collapse <- data %>%
    summarize(
      outc_n1 = sum(died[zipid1=="1"], na.rm = T),
      outc_n2 = sum(died[zipid2=="1"], na.rm = T),
      outc_n3 = sum(died[zipid3=="1"], na.rm = T),
      ...
    )

zipid1范围来自zipid1-zipid30,如何编写for循环而不是输入相同的30行?

谢谢!

2 个答案:

答案 0 :(得分:2)

另一个 dplyr 选项:

data <- gather(data, zip, value, -died, -pre_died) %>%
    filter(value == 1) %>%
    group_by(zip) %>%
    summarize(sum_died = sum(died, na.rm = T))

答案 1 :(得分:1)

您可以使用summarize_at,然后使用vars(matches(...))选择要汇总的列:

data %>% summarise_at(vars(matches('zipid')), funs(outc = sum(died[. == '1'], na.rm=T)))

#  zipid1_outc zipid2_outc zipid3_outc zipid4_outc zipid5_outc zipid6_outc zipid7_outc zipid8_outc
#1           2           0           0           0           0           0           0           0
#  zipid9_outc zipid10_outc zipid11_outc zipid12_outc zipid13_outc
#1           0            0            0            0            0