我有以下数据框:
I UserID | Day_of_week | hour | min | sms
#1 1 1 0 0 12
#2 1 1 0 30 20
#3 1 1 1 0 19
#4 1 1 1 30 11
#5 1 1 2 0 12
#6 1 1 2 30 7
... ... ... ... ... ....
#10 1 2 0 0 142
#11 1 2 0 30 201
#12 1 2 1 0 129
#13 1 2 1 30 111
... ... ... ... ... ....
列Day_of_week
从0到6,其中
列时间范围为0到23。
我想创建相同的日期框架,其中我添加了一周中的星期几(星期一到星期五)和周末(星期六和星期日)以及它们各自的小时和分钟的平均值。
我想要这样的事情:所有(星期一,星期二,星期五)0小时0分的平均值为12.1。
表示一周中的几天
y-代表周末
I UserID | Day_of_week | hour | min | sms
#1 1 x 0 0 12.1
#2 1 x 0 30 19.1
#3 1 x 1 0 14
#4 1 x 1 30 11
... ... ... ... ... ....
#10 1 y 0 0 1123
#11 1 y 0 30 23
#12 1 y 1 0 45
#13 1 y 1 30 121
... ... ... ... ... ....
答案 0 :(得分:1)
示例数据:
# Read and expand the dataset with some weekend days
df = read.table(text='I UserID Day_of_week hour min velocity
1 1 1 0 0 12
2 1 1 0 30 20
3 1 1 1 0 19
4 1 1 1 30 11
5 1 1 2 0 12
6 1 1 2 30 7
10 1 2 0 0 142
11 1 2 0 30 201
12 1 2 1 0 129
13 1 2 1 30 111',header=T)
df2 = df %>% mutate(Day_of_week=7)
df = rbind(df,df2)
您可以使用dplyr
并执行以下操作:
library(dplyr)
df %>% mutate(daytype = ifelse(Day_of_week %in% seq(1,5),'weekday','weekend')) %>%
group_by(UserID,daytype,hour,min) %>%
summarize(velocity=mean(velocity))
替代data.table
:
library(data.table)
setDT(df)[,daytype := ifelse(Day_of_week %in% seq(1,5),'weekday','weekend')][,.(velocity=mean(velocity)),.(UserID,daytype,hour,min)]
或与基础R:
df$daytype = ifelse(df$Day_of_week %in% seq(1,5),'weekday','weekend')
aggregate(velocity~UserID+daytype+hour+min,df,FUN='mean')
输出:
UserID daytype hour min velocity
1: 1 weekday 0 0 77.0
2: 1 weekday 0 30 110.5
3: 1 weekday 1 0 74.0
4: 1 weekday 1 30 61.0
5: 1 weekday 2 0 12.0
6: 1 weekday 2 30 7.0
7: 1 weekend 0 0 77.0
8: 1 weekend 0 30 110.5
9: 1 weekend 1 0 74.0
10: 1 weekend 1 30 61.0
11: 1 weekend 2 0 12.0
12: 1 weekend 2 30 7.0