我有一个数据集,我包括下面相关列的一小部分,
year ID type result
2003 1 new closed
2003 2 new transferred
2003 3 subsequent closed
2003 4 subsequent diverted
....
2015 1000 new closed
我想要计算的是次要因素的比例,(根据年份和结果分组的子句数/(no.subsequents + no。of news),如下所示:
year result subsequent_frac
2003 closed 0.10
2003 transferred 0.05
2003 ....
....
2015 closed 0.05
2015 transferred 0.1
我知道我可以分步进行,使用group_by和摘要来获取计数并分别执行每个结果....我想知道是否有更简洁/更快的方法来执行此操作。
答案 0 :(得分:1)
这是你在找什么?应用汇总会删除一个级别的分组,因此会删除第二个group_by。
dfSummarized <- group_by(df, year, type) %>%
summarise(subsequent_frac = n()) %>%
#group_by(type) %>% # maybe you don't need this?
mutate(freq = subsequent_frac / sum(subsequent_frac))