我希望在月级来源级别预订ID
Month Source Booking_id
Oct A 100
Nov B 101
Oct A 106
Jan B 109
Nov A 110
Nov B 111
data <- structure(list(Month = c("October", "November", "October", "January",
"November", "November"), Source = c("A", "B", "A", "B", "A",
"B"), Booking_ID = c(100L, 101L, 106L, 109L, 110L, 111L)), .Names = c("Month",
"Source", "Booking_ID"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))
答案 0 :(得分:2)
也许这会有所帮助:
table(data$Month, data$Booking_id)
# 100 101 106 109 110 111
# Jan 0 0 0 1 0 0
# Nov 0 1 0 0 1 1
# Oct 1 0 1 0 0 0
table(data$Month, data$Source)
# A B
# Jan 0 1
# Nov 1 2
# Oct 2 0
答案 1 :(得分:1)
Two alternatives:
1. aggregate
aggregate(Booking_ID ~ Month + Source, data, FUN = "length")
Output:
Month Source Booking_ID
1 November A 1
2 October A 2
3 January B 1
4 November B 2
2. sqldf
library(sqldf)
sqldf("SELECT Month, Source, COUNT(*) AS Count FROM data GROUP BY Month, Source")
Output:
Month Source Count
1 January B 1
2 November A 1
3 November B 2
4 October A 2
答案 2 :(得分:0)
我们可以使用dplyr
。我们按照“月份”,“来源”和“来源”进行分组。并获得&#39; Booking_id&#39;的n_distinct
即&quot; Booking_id&#39;的unique
个元素的数量。或者如果我们需要总数使用n()
。
library(dplyr)
data %>%
group_by(Month, Source) %>%
summarise(n= n_distinct(Booking_ID))
#if we wanted the total count instead of unique
#summarise(n=n())
# Month Source n
# (chr) (chr) (int)
#1 January B 1
#2 November A 1
#3 November B 2
#4 October A 2