我有一个如下所示的数据集。 在实际数据集中,有8619行。
Athlete Competing Country Year Total Medals
Michael Phelps United States 2012 6
Alicia Coutts Australia 2012 5
Missy Franklin United States 2012 5
Brian Leetch United States 2002 1
Mario Lemieux Canada 2002 1
Ylva Lindberg Sweden 2002 1
Eric Lindros Canada 2002 1
Ulrica Lindström Sweden 2002 1
Shelley Looney United States 2002 1
我希望按国家,年份和奖牌总数重新排列这些数据。
我想要像
这样的结果Country Year SumOfMedals
United States 2012 11
United States 2002 2
...
by(newmd$Total.Medals, newmd$Year, FUN=sum)
by(md$Total.Medals, md$Competing.Country, FUN=sum)
我厌倦了通过争论使用,但仍然坚持。 你们中的任何人可以帮助我吗?
答案 0 :(得分:3)
或者使用data.table
,我们将'data.frame'转换为'data.table'(setDT(df1)
),按'Competing_Country'分组,'年',获取sum
由感兴趣的变量组成的'Total_Medals and then
order。
library(data.table)
setDT(df1)[,list(SumOfMedals = sum(Total_Medals)),
by = .(Competing_Country, Year)
][order(-Competing_Country, -Year, -SumOfMedals)]
或者使用dplyr
,我们使用相同的方法。
library(dplyr)
df1 %>%
group_by(Competing_Country, Year) %>%
summary(SumOfMedals = sum(Total_Medals) %>%
arrange(desc(Competing_Country), desc(Year), desc(SumOfMedals))
df1 <- structure(list(Athlete = c("Michael Phelps", "Alicia Coutts",
"Missy Franklin", "Brian Leetch", "Mario Lemieux", "Ylva Lindberg",
"Eric Lindros", "Ulrica Lindström", "Shelley Looney"), Competing_Country = c("United States",
"Australia", "United States", "United States", "Canada", "Sweden",
"Canada", "Sweden", "United States"), Year = c(2012L, 2012L,
2012L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L), Total_Medals = c(6L,
5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("Athlete", "Competing_Country",
"Year", "Total_Medals"), class = "data.frame", row.names = c(NA,
-9L))
答案 1 :(得分:2)
您可以使用aggregate
轻松完成此操作,以获得奖牌数量的总和:
md2 <- aggregate(cbind(SumOfMedals = Total.Medals) ~ Competing.Country + Year),
data = md,
FUN = sum)
下一步是按md2
和Competing.Country
对SumOfMedals
进行排序,这是使用order
函数完成的:
md2 <- md2[order(Competing.Country, -SumOfMedals),]
全部完成。