我有一个数据框如下
+--------+-----------+-----+
| make | model | cnt |
+--------+-----------+-----+
| toyota | camry | 10 |
| toyota | corolla | 4 |
| honda | city | 8 |
| honda | accord | 13 |
| jeep | compass | 3 |
| jeep | wrangler | 5 |
| jeep | renegade | 1 |
| accura | x1 | 2 |
| accura | x3 | 1 |
+--------+-----------+-----+
我需要按Make
汇总此数据框,以便获得总量和份额 - 我这样做如下。
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1))
dfc <- df %>%
group_by(Make) %>%
summarise(volume = sum(Cnt)) %>%
mutate(share=volume/sum(volume)*100.0) %>%
arrange(desc(volume))
上述操作为我提供了由share
汇总的volume
和Make
,如下所示。
+--------+--------+-----------+
| make | volume | share |
+--------+--------+-----------+
| honda | 21 | 44.680851 |
| toyota | 14 | 29.787234 |
| jeep | 9 | 19.148936 |
| accura | 3 | 6.382979 |
+--------+--------+-----------+
我需要将所有except the first two rows
分组到一个组others
,并汇总volume
和share
,以便数据框如下所示。
+--------+--------+-----------+
| make | volume | share |
+--------+--------+-----------+
| honda | 21 | 44.680851 |
| toyota | 14 | 29.787234 |
| others | 12 | 25.53191 |
+--------+--------+-----------+
答案 0 :(得分:4)
library(dplyr)
# example data
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1), stringsAsFactors = F)
# specify number of rows
row_threshold = 2
df %>%
group_by(Make) %>%
summarise(volume = sum(Cnt)) %>%
mutate(share=volume/sum(volume)*100.0) %>%
arrange(desc(volume)) %>%
group_by(Make_upd = ifelse(row_number() > row_threshold, "others", Make)) %>%
summarise(volume = sum(volume),
share = sum(share))
# # A tibble: 3 x 3
# Make_upd volume share
# <chr> <dbl> <dbl>
# 1 honda 21 44.68085
# 2 others 12 25.53191
# 3 toyota 14 29.78723