如何使用group_by获取最大值并在R中汇总

时间:2018-06-15 15:08:33

标签: r dplyr left-join

如何获取分组数据框的最大值?

我有这个数据框

library(dplyr)

df_example <- data.frame(gender = c(rep("female", 12),rep("male", 12) ),
                     state = rep(c(rep("widow", 8),rep("orphan", 4)),2),
                     age_group = rep(c("(20, 30]","(20, 30]","(30, 40]","(30, 40]", "(40, 50]","(40, 50]","(50, 60]","(50, 60]","(0, 10]","(0, 10]","(10, 20]","(10, 20]"),2),
                     relatioship_status = c("single","married"),
                     amount = c(10,15,9, 8,17,92,12,41,23,75,46,64,12,9,7,22,3, 14,33,14,1,87,54,21))

生成

>  df_example
   gender  state relatioship_status age_group amount
1  female  widow             single  (20, 30]     10
2  female  widow            married  (20, 30]     15
3  female  widow             single  (30, 40]      9  
4  female  widow            married  (30, 40]      8
5  female  widow             single  (40, 50]     17
6  female  widow            married  (40, 50]     92
7  female  widow             single  (50, 60]     12
8  female  widow            married  (50, 60]     41
9  female orphan             single   (0, 10]     23
10 female orphan            married   (0, 10]     75
11 female orphan             single  (10, 20]     46
12 female orphan            married  (10, 20]     64
13   male  widow             single  (20, 30]     12
14   male  widow            married  (20, 30]      9
15   male  widow             single  (30, 40]      7
16   male  widow            married  (30, 40]     22
17   male  widow             single  (40, 50]      3
18   male  widow            married  (40, 50]     14
19   male  widow             single  (50, 60]     33
20   male  widow            married  (50, 60]     14
21   male orphan             single   (0, 10]      1
22   male orphan            married   (0, 10]     87
23   male orphan             single  (10, 20]     54
24   male orphan            married  (10, 20]     21

我想获得&#34;金额的最大数量&#34;按性别,州,relationship_status和年龄组分组。这样做的原因是我想将与获得的最大金额相对应的关系状态分配给另一个数据框中的某些NA值,这些数据框也共享相同的性别,州和年龄组变量。

然而,当我运行此代码时

df_example %>% 
  group_by(gender, state, age_group,relatioship_status) %>% 
  summarise(max(amount))

我得到了df_example的相同数据帧,这是合理的,因为我已经对每个变量进行了分组。

那么,我如何获得与按性别,状态和年龄组分组的最大数量相对应的关系状态,以便稍后在另一个数据框中分配给NA值?

PS:这样做

df_example %>% 
  group_by(gender, state, age_group) %>% 
  summarise(max(amount))

后来执行left_join()没有帮助,因为前面的代码没有显示关系状态。

谢谢:)

0 个答案:

没有答案