用dplyr汇总特定的列

时间:2019-05-07 10:39:14

标签: r dplyr

对于我的任务,我需要创建一个对象,其中包含每个对象 SexSeason的组合,即olympics数据集中不同运动项的数量。该对象的列应称为Competitor_SexOlympic_SeasonNum_Sports, 分别。

这是我目前所拥有的:

object <- olympics %>%
  group_by(Sex, Season) %>%
  summarise(Num_Sports = ???)

我在定义第三列时遇到了麻烦,这是体育活动的数量。我的数据如下:

structure(list(Name = c("A Lamusi", "Juhamatti Tapio Aaltonen", 
"Andreea Aanei", "Jamale (Djamel-) Aarrass (Ahrass-)", "Nstor Abad Sanjun"
), Sex = c("M", "M", "F", "M", "M"), Age = c(23L, 28L, 22L, 30L, 
23L), Height = c(170L, 184L, 170L, 187L, 167L), Weight = c(60, 
85, 125, 76, 64), Team = c("China", "Finland", "Romania", "France", 
"Spain"), NOC = c("CHN", "FIN", "ROU", "FRA", "ESP"), Games = c("2012 Summer", 
"2014 Winter", "2016 Summer", "2012 Summer", "2016 Summer"), 
    Year = c(2012L, 2014L, 2016L, 2012L, 2016L), Season = c("Summer", 
    "Winter", "Summer", "Summer", "Summer"), City = c("London", 
    "Sochi", "Rio de Janeiro", "London", "Rio de Janeiro"), Sport = c("Judo", 
    "Ice Hockey", "Weightlifting", "Athletics", "Gymnastics"), 
    Event = c("Judo Men's Extra-Lightweight", "Ice Hockey Men's Ice Hockey", 
    "Weightlifting Women's Super-Heavyweight", "Athletics Men's 1,500 metres", 
    "Gymnastics Men's Individual All-Around"), Medal = c(NA, 
    "Bronze", NA, NA, NA)), row.names = c("1", "2", "3", "4", 
"5"), class = "data.frame")

这可能很容易解决。有人可以帮我吗?将不胜感激!

最好的问候,

2 个答案:

答案 0 :(得分:1)

分组两次应该可以:

olympics %>% 
  group_by(Sex, Season, Sport) %>% 
  summarise(n()) %>% 
  group_by(Sex, Season) %>%
  summarise(n())

答案 1 :(得分:1)

您可以使用来自dplyr的length(unique(的等效项:n_distinct

olympics %>% 
  group_by(Sex, Season) %>% 
  summarise(Sports = n_distinct(Sport)) %>%
  rename(Competitor_Sex = Sex, Olympic_Season = Season) # To rename the columns