我的数据如下:
died pre_died zipid1 zipid2 zipid3 zipid4 zipid5 zipid6 zipid7 zipid8 zipid9 zipid10 zipid11 zipid12 zipid13
1 1 0.03070181 1 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0.13301922 1 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0.87192980 1 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0.01805484 1 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0.02586771 1 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0.02476175 1 0 0 0 0 0 0 0 0 0 0 0 0
我想将zipid1到zipid30的每个zipid的死亡变量相加。我目前的代码是这样的
collapse <- data %>%
summarize(
outc_n1 = sum(died[zipid1=="1"], na.rm = T),
outc_n2 = sum(died[zipid2=="1"], na.rm = T),
outc_n3 = sum(died[zipid3=="1"], na.rm = T),
...
)
zipid1范围来自zipid1-zipid30,如何编写for循环而不是输入相同的30行?
谢谢!
答案 0 :(得分:2)
另一个 dplyr 选项:
data <- gather(data, zip, value, -died, -pre_died) %>%
filter(value == 1) %>%
group_by(zip) %>%
summarize(sum_died = sum(died, na.rm = T))
答案 1 :(得分:1)
您可以使用summarize_at
,然后使用vars(matches(...))
选择要汇总的列:
data %>% summarise_at(vars(matches('zipid')), funs(outc = sum(died[. == '1'], na.rm=T)))
# zipid1_outc zipid2_outc zipid3_outc zipid4_outc zipid5_outc zipid6_outc zipid7_outc zipid8_outc
#1 2 0 0 0 0 0 0 0
# zipid9_outc zipid10_outc zipid11_outc zipid12_outc zipid13_outc
#1 0 0 0 0 0