使用dplyr
这是我的数据集:
Class Time Honors Grade Total Students
Math AM Yes PassFail 23
English AM No Letter 31
Science AM Yes Letter 22
Gym AM No PassFail 26
Math PM Yes PassFail 19
English PM No Letter 23
Science PM Yes Letter 24
Gym PM No PassFail 13
Math AM Yes PassFail 24
English AM Yes Letter 27
Science AM No Letter 28
Math PM No Letter 21
English PM Yes PassFail 23
Science PM No PassFail 22
我想运行四个查询,有四个越来越具体的答案。第一个查询将有一个group_by参数,第二个group_by参数,三个用于第三个,依此类推。
#query 1
df %>%
group_by(Class) %>%
summarise(NewValue = mean(`Total Students`))
#results
Class NewValue
<chr> <dbl>
1 English 26.00
2 Gym 19.50
3 Math 21.75
4 Science 24.0
第二个查询是与另一个group_by参数相同的基本计算。
#query2
df %>%
group_by(Class, Time) %>%
summarise(NewValue = mean(`Total Students`))
#results
Class Time NewValue
<chr> <chr> <dbl>
1 English AM 29.0
2 English PM 23.0
3 Gym AM 26.0
4 Gym PM 13.0
5 Math AM 23.5
6 Math PM 20.0
7 Science AM 25.0
8 Science PM 23.0
模式继续#query3
将是
df %>%
group_by(Class, Time, Honors) %>%
summarise(NewValue = mean(`Total Students`))
而#query4
将是
df %>%
group_by(Class, Time, Honors, Grade) %>%
summarise(NewValue = mean(`Total Students`))
问题:
有没有办法编写一个查询并使用for循环在group_by
参数中包含不断增加的细节级别?
例如,下面的伪代码不起作用;我希望有类似的解决方案:
resultsarray <- data.frame()
Groupbysteps <- c( "Class",
"Class, Time",
"Class, Time, Honors",
"Class, Time, Honors, Grade")
for (i in Groupbysteps) {
resultsarray <- df%>%
group_by( Groupbysteps) %>%
summarise(NewValue = mean(`Total Students`))
all <- rbind.fill(all, resultsarray)
}
答案 0 :(得分:1)
在rlang中尝试syms
,如下所示:
library(dplyr)
library(rlang)
L <- lapply(1:4, function(i) df %>%
group_by(!!!syms(names(df)[1:i])) %>%
summarize(newValue = mean(Total_Students))
)
提供4个数据框的列表L
,其列名为:
> lapply(L, names)
[[1]]
[1] "Class" "newValue"
[[2]]
[1] "Class" "Time" "newValue"
[[3]]
[1] "Class" "Time" "Honors" "newValue"
[[4]]
[1] "Class" "Time" "Honors" "Grade" "newValue"
答案 1 :(得分:1)
这可行。
Enter a year between 2005 and 2016:
2006
The Prestige, Christopher Nolan
The Departed, Martin Scorsese
MENU
Sort by:
y - Year
d - Director
t - Movie title
q - Quit
Choose an option:
y
2005:
Munich, Steven Spielberg
2006:
The Prestige, Christopher Nolan
The Departed, Martin Scorsese
2007:
Into the Wild, Sean Penn
2008:
The Dark Knight, Christopher Nolan
2009:
Mary and Max, Adam Elliot
2010:
The King's Speech, Tom Hooper
2011:
The Artist, Michel Hazanavicius
The Help, Tate Taylor
2012:
Argo, Ben Affleck
2013:
12 Years a Slave, Steve McQueen
2014:
Birdman, Alejandro G. Inarritu
2015:
Spotlight, Tom McCarthy
2016:
The BFG, Steven Spielberg
MENU
Sort by:
y - Year
d - Director
t - Movie title
q - Quit
Choose an option:
以下作品:
示例数据集
Groupbysteps <- c( "Class", "Time", "Honors", "Grade")
for (i in 1 : length(Groupbysteps)) {
resultsarray <- df%>%
group_by(.dots = Groupbysteps[1 : i]) %>%
summarise(NewValue = mean(`Total Students`))
all <- rbind.fill(all, resultsarray)
}
代码:
df <- iris[1:20, ]
colnames(df) <- c( "Class", "Time", "Honors", "Grade", "Total Students")
df[, 1] <- as.factor(sample(c("a", "b"), rep=T))
df[, 2] <- as.factor(sample(c("a", "b"), rep=T))
df[, 3] <- as.factor(sample(c("a", "b"), rep=T))
df[, 4] <- as.factor(sample(c("a", "b"), rep=T))
df[, 5] <- rnorm(20)