如何按R计数分组

时间:2015-10-29 09:54:26

标签: r

我希望在月级来源级别预订ID

Month   Source  Booking_id
Oct        A    100
Nov        B    101
Oct        A    106
Jan        B    109
Nov        A    110
Nov        B    111


data <- structure(list(Month = c("October", "November", "October", "January", 
"November", "November"), Source = c("A", "B", "A", "B", "A", 
"B"), Booking_ID = c(100L, 101L, 106L, 109L, 110L, 111L)), .Names = c("Month", 
"Source", "Booking_ID"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))

3 个答案:

答案 0 :(得分:2)

也许这会有所帮助:

table(data$Month, data$Booking_id)

#     100 101 106 109 110 111
# Jan   0   0   0   1   0   0
# Nov   0   1   0   0   1   1
# Oct   1   0   1   0   0   0


table(data$Month, data$Source)

#     A B
# Jan 0 1
# Nov 1 2
# Oct 2 0

答案 1 :(得分:1)

Two alternatives:

1. aggregate

aggregate(Booking_ID ~ Month + Source, data, FUN = "length")

Output:

     Month Source Booking_ID
1 November      A          1
2  October      A          2
3  January      B          1
4 November      B          2

2. sqldf

library(sqldf)
sqldf("SELECT  Month, Source, COUNT(*) AS Count FROM data GROUP BY Month, Source")

Output:

     Month Source Count
1  January      B     1
2 November      A     1
3 November      B     2
4  October      A     2

答案 2 :(得分:0)

我们可以使用dplyr。我们按照“月份”,“来源”和“来源”进行分组。并获得&#39; Booking_id&#39;的n_distinct即&quot; Booking_id&#39;的unique个元素的数量。或者如果我们需要总数使用n()

library(dplyr)
data %>%
  group_by(Month, Source) %>%
  summarise(n= n_distinct(Booking_ID))
  #if we wanted the total count instead of unique
  #summarise(n=n()) 

#    Month Source     n
#     (chr)  (chr) (int)
#1  January      B     1
#2 November      A     1
#3 November      B     2
#4  October      A     2