我有一个数据框:
station person_id date
1 0037 103103 2015-02-02
2 0037 306558 2015-02-02
3 0037 306558 2015-02-04
4 0037 306558 2015-02-05
我需要按工作站和日期汇总框架,以便结果中的每个唯一工作站/日期(每一行)显示有多少人落在该行上。
例如,前两行将折叠成一行,显示站点0037和日期2015-02-02的2个人。
我试过了,
result <- data_frame %>% group_by(station, week = week(date)) %>% summarise_each(funs(length), -date)
答案 0 :(得分:2)
你可以尝试:
group_by(df, station, date) %>% summarise(num_people = length(person_id))
Source: local data frame [3 x 3]
Groups: station [?]
station date num_people
(int) (fctr) (int)
1 37 2015-02-02 2
2 37 2015-02-04 1
3 37 2015-02-05 1
答案 1 :(得分:0)
在基地R中,您可以使用aggregate
:
# sample dataset
set.seed(1234)
df <- data.frame(station=sample(1:3, 50, replace=T),
person_id=sample(30000:35000, 50, replace=T),
date=sample(seq(as.Date("2015-02-05"), as.Date("2015-02-12")
by="day"), 50, replace=T))
# calculate number of people per station on a particular date
aggregate(cbind("passengerCount"=person_id) ~ station + date, data=df, FUN=length)
cbind
函数不是必需的,但它允许您提供变量名称。
答案 2 :(得分:0)
使用data.table
,我们会转换&#39; data.frame&#39;按&#39; data.table&#39;,按&#39;分组&#39;日期&#39;,我们得到行数(.N
)。
library(data.table)
setDT(df1)[, .(num_people = .N), .(station, date)]
# station date num_people
#1: 37 2015-02-02 2
#2: 37 2015-02-04 1
#3: 37 2015-02-05 1