通过将其他列保留在R中进行分组和汇总

时间:2017-07-09 05:48:34

标签: r grouping

我有一个数据框,我使用group_by函数对其进行分组,并使用R中的汇总函数对其进行汇总。

MM_group<-group_by(SYC,Method,Maturity)

我的数据集看起来像这样,

 Year           Group  County Seed.Brand Seed.Variety Seed.Maturity
1 2014 Group 0 No-till Yankton     Asgrow       AG0832           0.8
2 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9
3 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9
4 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9
5 2014 Group 0 No-till   Brown    Pioneer        90Y90           0.9
6 2014 Group 0 No-till   Brown     Asgrow       AG0934           0.9

Yield  Method Maturity digits
1 73.23 No-till        0      0
2 65.14 No-till        0      0
3 63.63 No-till        0      0
4 61.57 No-till        0      0
5 60.20 No-till        0      0

我按方法和分组进行分组到期。我正在努力获得县和年以获得最高收益率的方法&amp;成熟度组合。

我做了以下事情:

summarize(MM_group,Max_Yield=max(Yield))

       Method Maturity Max_Yield
           <chr>    <chr>     <dbl>
1      Irrigated        0    69.600
2      Irrigated        1    86.013
3      Irrigated        2    88.750
4      Irrigated        3    79.650
5        No-till        0    79.470
6        No-till        1    79.856
7        No-till        2    85.860
8        No-till        3    68.530
9  Non-irrigated        0    83.210
10 Non-irrigated        1    81.916
11 Non-irrigated        2   103.740
12 Non-irrigated        3    94.410

但是,这并没有给我这个县名和年份。我知道我可以使用cbind或join来获取数据但是想知道是否还有另一种更简单的方法。

预期产出:

          Method Maturity Max_Yield  Year                  Group
           <chr>    <chr>     <dbl> <int>                 <fctr>
1      Irrigated        0    69.600  2012 Group 0 or 1 Irrigated
2      Irrigated        1    86.013  2012 Group 0 or 1 Irrigated
3      Irrigated        2    88.750  2013 Group 2 or 3 Irrigated
4      Irrigated        3    79.650  2013 Group 2 or 3 Irrigated
5        No-till        0    79.470  2013        Group 0 No-till
6        No-till        1    79.856  2012        Group 1 No-till
7        No-till        2    85.860  2013        Group 2 No-till
8        No-till        3    68.530  2014        Group 3 No-till
9  Non-irrigated        0    83.210  2013  Group 0 Non-irrigated
10 Non-irrigated        1    81.916  2012  Group 1 Non-irrigated
11 Non-irrigated        2   103.740  2014  Group 2 Non-irrigated
12 Non-irrigated        3    94.410  2014  Group 3 Non-irrigated 

3 个答案:

答案 0 :(得分:5)

尝试

summarize(MM_group, 
          rank = which.max(Yield),
          Year_rank = Year[rank],
          County_rank = County[rank])

答案 1 :(得分:4)

我们可以使用

SYC %>%
   group_by(Method, Maturity) %>%
   slice(which.max(Yield)) %>% 
   rename(Max_Yield = Yield) %>%
   select(Method, Maturity, Max_Yield, Year, Group)

答案 2 :(得分:3)

您可以按照以下方式使用arrangeslice方法:

library(dplyr)
df %>% 
  arrange(Method, Maturity, desc(Yield)) %>% 
  group_by(Method, Maturity) %>%
  slice(1) %>%
  ungroup %>%
  select(Method, Maturity, Yield, Year, Group) %>%
  rename(Max_Yield = Yield)