我的数据框中每一行的平均值和条件

时间:2018-01-28 10:46:46

标签: r dataframe average

我有以下数据框:

    I  UserID | Day_of_week | hour | min | sms
   #1    1           1          0     0      12
   #2    1           1          0     30     20
   #3    1           1          1     0      19
   #4    1           1          1     30     11
   #5    1           1          2     0      12
   #6    1           1          2     30     7
   ...   ...         ...       ...    ...   ....
   #10    1          2          0     0      142
   #11    1          2          0     30     201
   #12    1          2          1     0      129
   #13    1          2          1     30     111
   ...   ...         ...       ...    ...   ....

Day_of_week从0到6,其中

  • (1 =星期一,2 =星期二...... 5 =星期五)
  • 0,6分别是星期六和星期日。

列时间范围为0到23。

我想创建相同的日期框架,其中我添加了一周中的星期几(星期一到星期五)和周末(星期六和星期日)以及它们各自的小时和分钟的平均值。

我想要这样的事情:所有(星期一,星期二,星期五)0小时0分的平均值为12.1。

表示一周中的几天

y-代表周末

    I  UserID | Day_of_week | hour | min | sms
   #1    1           x          0     0      12.1
   #2    1           x          0     30     19.1
   #3    1           x          1     0      14
   #4    1           x          1     30     11
   ...   ...         ...       ...    ...   ....
   #10    1          y          0     0      1123
   #11    1          y          0     30     23
   #12    1          y          1     0      45
   #13    1          y          1     30     121
   ...   ...         ...       ...    ...   ....

1 个答案:

答案 0 :(得分:1)

示例数据:

# Read and expand the dataset with some weekend days
df = read.table(text='I  UserID  Day_of_week  hour  min  velocity
1    1           1          0     0      12
2    1           1          0     30     20
3    1           1          1     0      19
4    1           1          1     30     11
5    1           1          2     0      12
6    1           1          2     30     7
10    1          2          0     0      142
11    1          2          0     30     201
12    1          2          1     0      129
13    1          2          1     30     111',header=T)
df2 = df %>% mutate(Day_of_week=7)
df = rbind(df,df2)

您可以使用dplyr并执行以下操作:

library(dplyr)
df %>% mutate(daytype = ifelse(Day_of_week %in% seq(1,5),'weekday','weekend')) %>%
  group_by(UserID,daytype,hour,min) %>% 
  summarize(velocity=mean(velocity))

替代data.table

library(data.table)
setDT(df)[,daytype := ifelse(Day_of_week %in% seq(1,5),'weekday','weekend')][,.(velocity=mean(velocity)),.(UserID,daytype,hour,min)]

或与基础R:

df$daytype = ifelse(df$Day_of_week %in% seq(1,5),'weekday','weekend')
aggregate(velocity~UserID+daytype+hour+min,df,FUN='mean')

输出:

    UserID daytype hour min velocity
 1:      1 weekday    0   0     77.0
 2:      1 weekday    0  30    110.5
 3:      1 weekday    1   0     74.0
 4:      1 weekday    1  30     61.0
 5:      1 weekday    2   0     12.0
 6:      1 weekday    2  30      7.0
 7:      1 weekend    0   0     77.0
 8:      1 weekend    0  30    110.5
 9:      1 weekend    1   0     74.0
10:      1 weekend    1  30     61.0
11:      1 weekend    2   0     12.0
12:      1 weekend    2  30      7.0