将数据与Base R结合

时间:2019-05-08 10:21:33

标签: r

我目前需要将dplyr代码转换为基本R代码。我的dplyr代码为我提供了3个栏目,分别是竞争对手的性别,奥运季节和不同运动项目的数量。代码如下:

olympics %>% 
  group_by(Sex, Season, Sport) %>% 
  summarise(n()) %>% 
  group_by(Sex, Season) %>%
  summarise(n()) %>%
  setNames(c("Competitor_Sex", "Olympic_Season", "Num_Sports"))

我的数据结构如下。

 structure(list(Name = c("A Lamusi", "Juhamatti Tapio Aaltonen", 
"Andreea Aanei", "Jamale (Djamel-) Aarrass (Ahrass-)", "Nstor Abad Sanjun", 
"Nstor Abad Sanjun"), Sex = c("M", "M", "F", "M", "M", "M"), 
    Age = c(23L, 28L, 22L, 30L, 23L, 23L), Height = c(170L, 184L, 
    170L, 187L, 167L, 167L), Weight = c(60, 85, 125, 76, 64, 
    64), Team = c("China", "Finland", "Romania", "France", "Spain", 
    "Spain"), NOC = c("CHN", "FIN", "ROU", "FRA", "ESP", "ESP"
    ), Games = c("2012 Summer", "2014 Winter", "2016 Summer", 
    "2012 Summer", "2016 Summer", "2016 Summer"), Year = c(2012L, 
    2014L, 2016L, 2012L, 2016L, 2016L), Season = c("Summer", 
    "Winter", "Summer", "Summer", "Summer", "Summer"), City = c("London", 
    "Sochi", "Rio de Janeiro", "London", "Rio de Janeiro", "Rio de Janeiro"
    ), Sport = c("Judo", "Ice Hockey", "Weightlifting", "Athletics", 
    "Gymnastics", "Gymnastics"), Event = c("Judo Men's Extra-Lightweight", 
    "Ice Hockey Men's Ice Hockey", "Weightlifting Women's Super-Heavyweight", 
    "Athletics Men's 1,500 metres", "Gymnastics Men's Individual All-Around", 
    "Gymnastics Men's Floor Exercise"), Medal = c(NA, "Bronze", 
    NA, NA, NA, NA), BMI = c(20.7612456747405, 25.1063327032136, 
    43.2525951557093, 21.7335354170837, 22.9481157445588, 22.9481157445588
    )), .Names = c("Name", "Sex", "Age", "Height", "Weight", 
"Team", "NOC", "Games", "Year", "Season", "City", "Sport", "Event", 
"Medal", "BMI"), row.names = c(NA, 6L), class = "data.frame")

有人知道如何将其转换为基R吗?

2 个答案:

答案 0 :(得分:5)

由于您在dplyr中进行了两次分组,因此可以在基本R中使用双aggregate

setNames(aggregate(Name~Sex + Season, 
      aggregate(Name~Sex + Season + Sport, olympics, length), length), 
       c("Competitor_Sex", "Olympic_Season", "Num_Sports"))

#   Competitor_Sex Olympic_Season Num_Sports
#1               F         Summer          1
#2               M         Summer          3
#3               M         Winter          1

这将提供与dplyr选项相同的输出

library(dplyr)
olympics %>% 
  group_by(Sex, Season, Sport) %>% 
  summarise(n()) %>% 
  group_by(Sex, Season) %>%
  summarise(n()) %>%
  setNames(c("Competitor_Sex", "Olympic_Season", "Num_Sports"))

#  Competitor_Sex Olympic_Season Num_Sports
#  <chr>          <chr>               <int>
#1 F              Summer                  1
#2 M              Summer                  3
#3 M              Winter                  1

答案 1 :(得分:2)

一个base R选项将两次使用aggregate

out <- aggregate(BMI ~ Sex + Season, 
     aggregate(BMI ~ Sex + Season + Sport, olympics, length), length)
names(out) <- c("Competitor_Sex", "Olympic_Season", "Num_Sports")
out
#   Competitor_Sex Olympic_Season Num_Sports
#1              F         Summer          1
#2              M         Summer          3
#3              M         Winter          1

类似于OP的输出

olympics %>% 
   group_by(Sex, Season, Sport) %>% 
   summarise(n()) %>% 
   group_by(Sex, Season) %>%
   summarise(n()) %>%
   setNames(c("Competitor_Sex", "Olympic_Season", "Num_Sports"))
# A tibble: 3 x 3
# Groups:   Sex [2]
#  Competitor_Sex Olympic_Season Num_Sports
#  <chr>          <chr>               <int>
#1 F              Summer                  1
#2 M              Summer                  3
#3 M              Winter                  1

或者可以使用table中的base R以紧凑的方式完成

table(sub(",[^,]+$", "", names(table(do.call(paste, 
        c(olympics[c("Sex", "Season", "Sport")], sep=","))))))

 #  F,Summer M,Summer M,Winter 
 #      1        3        1