理解记录(sort = TRUE)

时间:2014-07-20 18:29:54

标签: r dplyr

之前有这段代码:

flights %>%
  group_by(dest) %>%
  summarise(arr_delay = mean(arr_delay, na.rm = TRUE),
  n = n()) %>%
arrange(desc(arr_delay))

这段代码我明白了。但是,正好在下面的代码显示:

flights %>%
  group_by(carrier, flight, dest) %>%
  tally(sort = TRUE) %>% # Save some typing
  filter( n == 365)

所以这段代码我没有得到

tally(sort = TRUE)

当它说保存一些打字时,究竟节省了什么?我了解tally(sort = TRUE)取代summerise(n = n()),但它如何“保存打字”以及它们如何相互关联?如果有人能给我一个tally(sort = TRUE)的分解,那将非常感激!

1 个答案:

答案 0 :(得分:18)

我远不是dplyr专家,但由于没有人想回答,我会试一试。所以从tally documentation开始它只是给你每组的频率。如果您嵌入两个tally,它们只会sum频率,例如:

library(dplyr)
tally(group_by(CO2, Plant)) 

#    Plant n
# 1    Qn1 7
# 2    Qn2 7
# 3    Qn3 7
# 4    Qc1 7
# 5    Qc3 7
# 6    Qc2 7
# 7    Mn3 7
# 8    Mn2 7
# 9    Mn1 7
# 10   Mc2 7
# 11   Mc3 7
# 12   Mc1 7

只是基础R table

table(CO2$Plant)
# Qn1 Qn2 Qn3 Qc1 Qc3 Qc2 Mn3 Mn2 Mn1 Mc2 Mc3 Mc1 
#   7   7   7   7   7   7   7   7   7   7   7   7 

tally(tally(group_by(CO2, Plant)))
#    n
# 1 84

只是

sum(table(CO2$Plant))
# [1] 84

tally(CO2)
#   n
#1 84

nrow(CO2)
# [1] 84

所以回答你的问题,

flights %>%
  group_by(carrier, flight, dest) %>%
  tally(sort = TRUE) %>% # Save some typing
  filter( n == 365)

装置

Take data set "flights" 
 group it by "carrier", "flight" and "dest" columns
 give me the frequencies of these combinations and sort them by frequecy
 return only the combinations that their frequency equals to 365