以下代码来自: R for Data Science,5.6.5按多个变量分组。
我的问题是:R
正在经历什么过程,或者R在降低最终1x2小球的每一步中正在做什么?
我认为我理解前几项作业:
'daily'
正在按年份,月份然后是日期对'flights'
数据帧进行排序。
'per_day'
创建3列'year'
,'month'
,'day'
和最后第四列'flights'
列,该列计算一年中该月某天的航班数
但是从'per_month'
开始,我开始了解如何评估该函数。
例如,在per_month的小节中,R知道仅取2013年(即第1个月)的航班作为总和,而不是仅添加整个航班列并删除day列会发生什么情况?
谢谢!
daily <- group_by(flights, year, month, day)
(per_day <- summarise(daily, flights = n()))
#> # A tibble: 365 x 4
#> # Groups: year, month [?]
#> year month day flights
#> <int> <int> <int> <int>
#> 1 2013 1 1 842
#> 2 2013 1 2 943
#> 3 2013 1 3 914
#> 4 2013 1 4 915
#> 5 2013 1 5 720
#> 6 2013 1 6 832
#> # … with 359 more rows
(per_month <- summarise(per_day, flights = sum(flights)))
#> # A tibble: 12 x 3
#> # Groups: year [?]
#> year month flights
#> <int> <int> <int>
#> 1 2013 1 27004
#> 2 2013 2 24951
#> 3 2013 3 28834
#> 4 2013 4 28330
#> 5 2013 5 28796
#> 6 2013 6 28243
#> # … with 6 more rows
(per_year <- summarise(per_month, flights = sum(flights)))
#> # A tibble: 1 x 2
#> year flights
#> <int> <int>
#> 1 2013 336776
答案 0 :(得分:1)
group_by
添加一个类并添加一个group属性,然后每个摘要调用一个组的剥离,同时针对分组变量的每种不同组合来聚合数据。当所有组都剥离后,tibble
将失去其grouped_df
类。
library(dplyr)
library(nycflights13)
names(attributes(flights))
#> [1] "names" "row.names" "class"
class(flights)
#> [1] "tbl_df" "tbl" "data.frame"
groups(flights)
#> NULL
daily <- group_by(flights, year, month, day)
names(attributes(daily))
#> [1] "names" "row.names" "class" "groups"
class(daily)
#> [1] "grouped_df" "tbl_df" "tbl" "data.frame"
groups(daily)
#> [[1]]
#> year
#>
#> [[2]]
#> month
#>
#> [[3]]
#> day
per_day <- summarise(daily, flights = n())
names(attributes(per_day))
#> [1] "names" "row.names" "class" "groups"
class(per_day)
#> [1] "grouped_df" "tbl_df" "tbl" "data.frame"
groups(per_day)
#> [[1]]
#> year
#>
#> [[2]]
#> month
per_month <- summarise(per_day, flights = sum(flights))
class(per_month)
#> [1] "grouped_df" "tbl_df" "tbl" "data.frame"
groups(per_month)
#> [[1]]
#> year
per_year <- summarise(per_month, flights = sum(flights))
class(per_year)
#> [1] "tbl_df" "tbl" "data.frame"
groups(per_year)
#> NULL