如何按R中的两个以上因素对数据进行分组

时间:2015-11-30 04:04:30

标签: r

我有一个如下所示的数据集。 在实际数据集中,有8619行。

Athlete      Competing Country  Year    Total Medals
Michael Phelps    United States 2012    6
Alicia Coutts     Australia     2012    5
Missy Franklin    United States 2012    5
Brian Leetch      United States 2002    1
Mario Lemieux     Canada        2002    1
Ylva Lindberg     Sweden        2002    1
Eric Lindros      Canada        2002    1
Ulrica Lindström  Sweden        2002    1
Shelley Looney    United States 2002    1

我希望按国家,年份和奖牌总数重新排列这些数据。

我想要像

这样的结果
Country        Year  SumOfMedals
United States  2012  11
United States  2002   2
...

by(newmd$Total.Medals, newmd$Year, FUN=sum)
by(md$Total.Medals, md$Competing.Country, FUN=sum)

我厌倦了通过争论使用,但仍然坚持。 你们中的任何人可以帮助我吗?

2 个答案:

答案 0 :(得分:3)

或者使用data.table,我们将'data.frame'转换为'data.table'(setDT(df1)),按'Competing_Country'分组,'年',获取sum由感兴趣的变量组成的'Total_Medals and then order。

library(data.table)
setDT(df1)[,list(SumOfMedals = sum(Total_Medals)), 
   by = .(Competing_Country, Year)
        ][order(-Competing_Country, -Year, -SumOfMedals)]

或者使用dplyr,我们使用相同的方法。

library(dplyr)
df1 %>%
    group_by(Competing_Country, Year) %>%
    summary(SumOfMedals = sum(Total_Medals) %>%
    arrange(desc(Competing_Country), desc(Year), desc(SumOfMedals))

数据

 df1 <- structure(list(Athlete = c("Michael Phelps", "Alicia Coutts", 
"Missy Franklin", "Brian Leetch", "Mario Lemieux", "Ylva Lindberg", 
"Eric Lindros", "Ulrica Lindström", "Shelley Looney"), Competing_Country = c("United States", 
"Australia", "United States", "United States", "Canada", "Sweden", 
"Canada", "Sweden", "United States"), Year = c(2012L, 2012L, 
2012L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L), Total_Medals = c(6L, 
5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("Athlete", "Competing_Country", 
"Year", "Total_Medals"), class = "data.frame", row.names = c(NA, 
-9L))

答案 1 :(得分:2)

您可以使用aggregate轻松完成此操作,以获得奖牌数量的总和:

md2 <- aggregate(cbind(SumOfMedals = Total.Medals) ~ Competing.Country + Year),
          data = md,
          FUN = sum)

下一步是按md2Competing.CountrySumOfMedals进行排序,这是使用order函数完成的:

md2 <- md2[order(Competing.Country, -SumOfMedals),] 

全部完成。