Question

使用dplyr

这是我的数据集：

Class   Time    Honors  Grade    Total Students
Math    AM      Yes     PassFail    23
English AM      No      Letter      31
Science AM      Yes     Letter      22
Gym     AM      No      PassFail    26
Math    PM      Yes     PassFail    19
English PM      No      Letter      23
Science PM      Yes     Letter      24
Gym     PM      No      PassFail    13
Math    AM      Yes     PassFail    24
English AM      Yes     Letter      27
Science AM      No      Letter      28
Math    PM      No      Letter      21
English PM      Yes     PassFail    23
Science PM      No      PassFail    22

我想运行四个查询，有四个越来越具体的答案。第一个查询将有一个group_by参数，第二个group_by参数，三个用于第三个，依此类推。

#query 1 
df %>%
  group_by(Class) %>%
  summarise(NewValue = mean(`Total Students`))

#results
    Class NewValue
    <chr>    <dbl>
1 English    26.00
2     Gym    19.50
3    Math    21.75
4 Science    24.0

第二个查询是与另一个group_by参数相同的基本计算。

#query2
df %>%
  group_by(Class, Time) %>%
  summarise(NewValue = mean(`Total Students`))

#results
    Class  Time NewValue
    <chr> <chr>    <dbl>
1 English    AM     29.0
2 English    PM     23.0
3     Gym    AM     26.0
4     Gym    PM     13.0
5    Math    AM     23.5
6    Math    PM     20.0
7 Science    AM     25.0
8 Science    PM     23.0

模式继续#query3将是

 df %>%
  group_by(Class, Time, Honors) %>%
  summarise(NewValue = mean(`Total Students`))

而#query4将是

df %>%
  group_by(Class, Time, Honors, Grade) %>%
  summarise(NewValue = mean(`Total Students`))

问题：

有没有办法编写一个查询并使用for循环在group_by参数中包含不断增加的细节级别？

例如，下面的伪代码不起作用;我希望有类似的解决方案：

resultsarray <- data.frame()
Groupbysteps <- c( "Class", 
                   "Class, Time", 
                   "Class, Time, Honors", 
                   "Class, Time, Honors, Grade")

for (i in Groupbysteps) {
      resultsarray <- df%>%
                       group_by( Groupbysteps) %>%
                       summarise(NewValue = mean(`Total Students`))

 all <- rbind.fill(all, resultsarray)
}

Answer 1

在rlang中尝试syms，如下所示：

library(dplyr)
library(rlang)

L <- lapply(1:4, function(i) df %>% 
                               group_by(!!!syms(names(df)[1:i])) %>% 
                               summarize(newValue = mean(Total_Students))
)

提供4个数据框的列表L，其列名为：

> lapply(L, names)
[[1]]
[1] "Class"    "newValue"

[[2]]
[1] "Class"    "Time"     "newValue"

[[3]]
[1] "Class"    "Time"     "Honors"   "newValue"

[[4]]
[1] "Class"    "Time"     "Honors"   "Grade"    "newValue"

Answer 2

这可行。

Enter a year between 2005 and 2016:
2006
The Prestige, Christopher Nolan
The Departed, Martin Scorsese

MENU
Sort by:
y - Year
d - Director
t - Movie title
q - Quit

Choose an option:
y
2005:
    Munich, Steven Spielberg
2006:
    The Prestige, Christopher Nolan
    The Departed, Martin Scorsese
2007:
    Into the Wild, Sean Penn
2008:
    The Dark Knight, Christopher Nolan
2009:
    Mary and Max, Adam Elliot
2010:
    The King's Speech, Tom Hooper
2011:
    The Artist, Michel Hazanavicius
    The Help, Tate Taylor
2012:
    Argo, Ben Affleck
2013:
    12 Years a Slave, Steve McQueen
2014:
    Birdman, Alejandro G. Inarritu
2015:
    Spotlight, Tom McCarthy
2016:
    The BFG, Steven Spielberg

MENU
Sort by:
y - Year
d - Director
t - Movie title
q - Quit

Choose an option:

以下作品：

示例数据集

Groupbysteps <- c( "Class", "Time", "Honors", "Grade")

for (i in 1 : length(Groupbysteps)) {
      resultsarray <- df%>%
                       group_by(.dots = Groupbysteps[1 : i]) %>%
                       summarise(NewValue = mean(`Total Students`))

 all <- rbind.fill(all, resultsarray)
}

代码：

df <- iris[1:20, ]
colnames(df) <- c( "Class", "Time", "Honors", "Grade", "Total Students")
df[, 1] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 2] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 3] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 4] <- as.factor(sample(c("a", "b"), rep=T)) 
df[, 5] <- rnorm(20)

基于列表R的动态group_by参数，用于循环/函数

2 个答案: