按星期几过滤数据框

时间:2015-01-14 17:40:04

标签: r dataframe lapply sapply

我有一个包含网站每日统计信息的数据框

> head(df,7)
        date users sessions goalCompletionsAll       dow        gos        gou
1 2014-08-01  3514     5239                 90    Friday 0.01717885 0.02561184
2 2014-08-02  3382     4874                 99  Saturday 0.02031186 0.02927262
3 2014-08-03  3981     5499                 81    Sunday 0.01472995 0.02034665
4 2014-08-04  4493     6434                 99    Monday 0.01538701 0.02203428
5 2014-08-05  4344     6505                111   Tuesday 0.01706380 0.02555249
6 2014-08-06  4091     6117                115 Wednesday 0.01880007 0.02811049
7 2014-08-07  3617     5519                 90  Thursday 0.01630730 0.02488250

我需要在一周中找到每日平均值。 这是我尝试这样做的:

> daysOfWeek
[1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday"  "Sunday"
dailyAverages <- sapply(daysOfWeek, function (x) {
  qq <- filter(df, dow==x)
  convRate <- qq$goalCompletionsAll/qq$users
  run <- data.frame(mean(convRate),sd(convRate), 
  max(convRate), min(convRate), median(convRate))
  names(run) <- c("Mean", "SD", "Max", "Min", "Median")
  run
})

> dailyAverages
       Monday      Tuesday     Wednesday   Thursday    Friday     Saturday   
Mean   0.02496614  0.0262649   0.02576256  0.02602963  0.026684   0.02440045 
SD     0.003603139 0.004615455 0.003891674 0.004525479 0.00445875 0.004779429
Max    0.03266055  0.03274712  0.03141136  0.03543914  0.03673769 0.033213   
Min    0.01853659  0.01748487  0.01904376  0.02026432  0.01734417 0.01593625 
Median 0.02488883  0.02651838  0.02629004  0.02543797  0.02599134 0.02502503 
       Sunday     
Mean   0.02426048 
SD     0.004086276
Max    0.03112314 
Min    0.01581155 
Median 0.02456262 

这个结果几乎我想要的东西,但它需要转置:

> dx <- t(dailyAverages)
> dx
          Mean       SD          Max        Min        Median    
Monday    0.02496614 0.003603139 0.03266055 0.01853659 0.02488883
Tuesday   0.0262649  0.004615455 0.03274712 0.01748487 0.02651838
Wednesday 0.02576256 0.003891674 0.03141136 0.01904376 0.02629004
Thursday  0.02602963 0.004525479 0.03543914 0.02026432 0.02543797
Friday    0.026684   0.00445875  0.03673769 0.01734417 0.02599134
Saturday  0.02440045 0.004779429 0.033213   0.01593625 0.02502503
Sunday    0.02426048 0.004086276 0.03112314 0.01581155 0.02456262

我想知道,如果有更高效,非丑陋的方式来做同样的事情吗?

1 个答案:

答案 0 :(得分:4)

您可以尝试dplyr。链/管道运算符(%>%)将“lhs”和“rhs”连接在一起。变量“dow”用作分组变量(group_by(..),使用transmute计算“convRate”,这将删除现有变量,得到meansd使用summarise_each的“convRate”等。 summarise_each的优点是它可以同时用于多个列。

library(dplyr)
df$dow <- substr(df$dow, 1,3)
res <- df %>%
          group_by(dow) %>% 
          transmute(convRate=goalCompletionsAll/users) %>% 
          summarise_each(funs(mean, sd, max, min, median), convRate)
indx <- match(c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'), res$dow)
res1 <- res[indx,]