我有数据框:
station date classification
1 June - 01/16 A
2 June - 03/16 B
1 June - 01/16 A
7 June - 01/16 C
1 June - 03/16 A
2 June - 03/16 B
2 June - 03/16 B
我想获得由电台号码和日期汇总的A,B和C总出现次数:
例如,6月1日的1号站有2个As,而6月3号的2号站有3个B。
我试过了,
aggregate(x = list(data_frame$classification), by = list(station=data_frame$station, Date=data_frame$date), function(x) length(unique(x))
答案 0 :(得分:3)
如果我们需要计算A',' B'和' C',重塑可能会更好。我们转换了' data.frame'到' data.table' (setDT(data_frame)
)并使用dcast
中的data.table
来重塑“长期'广泛的'格式,将fun.aggregate
指定为length
。
library(data.table)
dcast(setDT(data_frame), station+date~classification, length)
# station date A B C
#1: 1 June - 01/16 2 0 0
#2: 1 June - 03/16 1 0 0
#3: 2 June - 03/16 0 3 0
#4: 7 June - 01/16 0 0 1
dplyr
选项
library(dplyr)
data_frame %>%
group_by(station, date, classification) %>%
tally()
# station date classification n
# (int) (chr) (chr) (int)
#1 1 June - 01/16 A 2
#2 1 June - 03/16 A 1
#3 2 June - 03/16 B 3
#4 7 June - 01/16 C 1
data_frame <- structure(list(station = c(1L, 2L, 1L, 7L, 1L, 2L, 2L),
date = c("June - 01/16",
"June - 03/16", "June - 01/16", "June - 01/16", "June - 03/16",
"June - 03/16", "June - 03/16"), classification = c("A", "B",
"A", "C", "A", "B", "B")), .Names = c("station", "date", "classification"
), class = "data.frame", row.names = c(NA, -7L))
答案 1 :(得分:0)
包plyr非常适合这个。
library(plyr)
count(data_frame, c("classification", "station", "date"))
答案 2 :(得分:0)
sql方式。
sqldf("select station, date ,classification, count(classification) from x group by station, date ,classification")