我有这个df:
boxChange sameCat
# C1 > C2 TRUE
# C1 > C2 TRUE
# A0 > A1 TRUE
# A1 > E4 FALSE
# C3 > E6 FALSE
# E0 > E3 TRUE
# ... ...
我想按两列分组,对出现的次数进行计数,并按其编号进行排列。通过使用dplyr
,我会这样:
df2 <- df %>%
group_by(boxChange, sameCat) %>%
summarise(occs = n()) %>%
arrange(desc(occs))
获取:
boxChange sameCat occs
# C1 > C2 TRUE 312
# A0 > A1 TRUE 189
# E0 > E3 TRUE 13
# C3 > E6 FALSE 123
# A1 > E4 FALSE 70
现在,我想计算每个occs
在总数和累计百分比中所占的百分比,得到类似的结果
boxChange sameCat occs perc cump
# C1 > C2 TRUE 312 44 44
# A0 > A1 TRUE 189 27 71
# E0 > E3 TRUE 13 2 73
# C3 > E6 FALSE 123 17 90
# A1 > E4 FALSE 70 10 100
我尝试了以下内容
df2 <- df %>%
group_by(boxChange, sameCat) %>%
summarise(occs = n()) %>%
arrange(desc(occs)) %>%
mutate(perc = occs/sum(occs)*100) %>%
mutate(cump = cumsum(perc))
但是输出如下
boxChange sameCat occs perc cump
# C1 > C2 TRUE 312 100 100
# A0 > A1 TRUE 189 100 100
# E0 > E3 TRUE 13 100 100
# C3 > E6 FALSE 123 100 100
# A1 > E4 FALSE 70 100 100
我无法理解为什么会这样,并且找不到其他报告类似问题的线程。你有什么见识吗?
答案 0 :(得分:1)
我们可能需要ungroup
df2 <- df %>%
group_by(boxChange, sameCat) %>%
summarise(occs = n()) %>%
arrange(desc(occs)) %>%
ungroup %>%
mutate(perc = occs/sum(occs)*100,
cump = cumsum(perc))
-
或者,如果我们需要保持分组完整,请使用sum(.$occs)
如果我们从OP的arraged
'occs'开始
df %>%
ungroup %>%
mutate(perc = round(occs/sum(occs) * 100),
cump = cumsum(perc))
# boxChange sameCat occs perc cump
#1 C1 > C2 TRUE 312 44 44
#2 A0 > A1 TRUE 189 27 71
#3 E0 > E3 TRUE 13 2 73
#4 C3 > E6 FALSE 123 17 90
#5 A1 > E4 FALSE 70 10 100