dplyr - 组最后n行值

时间:2017-12-11 12:48:00

标签: r dplyr

我有一个数据框如下

+--------+-----------+-----+
|  make  |   model   | cnt |
+--------+-----------+-----+
| toyota |  camry    |  10 |
| toyota |  corolla  |   4 |
| honda  |  city     |   8 |
| honda  |  accord   |  13 |
| jeep   |  compass  |   3 |
| jeep   |  wrangler |   5 |
| jeep   |  renegade |   1 |
| accura |  x1       |   2 |
| accura |  x3       |   1 |
+--------+-----------+-----+

我需要按Make汇总此数据框,以便获得总量和份额 - 我这样做如下。

df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
                 Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
                 Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1))
dfc <- df %>%
  group_by(Make) %>%
  summarise(volume = sum(Cnt)) %>%
  mutate(share=volume/sum(volume)*100.0) %>%
  arrange(desc(volume))

上述操作为我提供了由share汇总的volumeMake,如下所示。

+--------+--------+-----------+
| make   | volume | share     |
+--------+--------+-----------+
| honda  | 21     | 44.680851 |
| toyota | 14     | 29.787234 |
| jeep   | 9      | 19.148936 |
| accura | 3      | 6.382979  |
+--------+--------+-----------+

我需要将所有except the first two rows分组到一个组others,并汇总volumeshare,以便数据框如下所示。

+--------+--------+-----------+
| make   | volume | share     |
+--------+--------+-----------+
| honda  | 21     | 44.680851 |
| toyota | 14     | 29.787234 |
| others | 12     | 25.53191  |
+--------+--------+-----------+

1 个答案:

答案 0 :(得分:4)

library(dplyr)

# example data
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
                 Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
                 Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1), stringsAsFactors = F)

# specify number of rows
row_threshold = 2

df %>%
  group_by(Make) %>%
  summarise(volume = sum(Cnt)) %>%
  mutate(share=volume/sum(volume)*100.0) %>%
  arrange(desc(volume)) %>%
  group_by(Make_upd = ifelse(row_number() > row_threshold, "others", Make)) %>%
  summarise(volume = sum(volume),
            share = sum(share))

# # A tibble: 3 x 3
#   Make_upd volume    share
#      <chr>  <dbl>    <dbl>
# 1    honda     21 44.68085
# 2   others     12 25.53191
# 3   toyota     14 29.78723