我有一个包含网站每日统计信息的数据框
> head(df,7)
date users sessions goalCompletionsAll dow gos gou
1 2014-08-01 3514 5239 90 Friday 0.01717885 0.02561184
2 2014-08-02 3382 4874 99 Saturday 0.02031186 0.02927262
3 2014-08-03 3981 5499 81 Sunday 0.01472995 0.02034665
4 2014-08-04 4493 6434 99 Monday 0.01538701 0.02203428
5 2014-08-05 4344 6505 111 Tuesday 0.01706380 0.02555249
6 2014-08-06 4091 6117 115 Wednesday 0.01880007 0.02811049
7 2014-08-07 3617 5519 90 Thursday 0.01630730 0.02488250
我需要在一周中找到每日平均值。 这是我尝试这样做的:
> daysOfWeek
[1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday" "Sunday"
dailyAverages <- sapply(daysOfWeek, function (x) {
qq <- filter(df, dow==x)
convRate <- qq$goalCompletionsAll/qq$users
run <- data.frame(mean(convRate),sd(convRate),
max(convRate), min(convRate), median(convRate))
names(run) <- c("Mean", "SD", "Max", "Min", "Median")
run
})
> dailyAverages
Monday Tuesday Wednesday Thursday Friday Saturday
Mean 0.02496614 0.0262649 0.02576256 0.02602963 0.026684 0.02440045
SD 0.003603139 0.004615455 0.003891674 0.004525479 0.00445875 0.004779429
Max 0.03266055 0.03274712 0.03141136 0.03543914 0.03673769 0.033213
Min 0.01853659 0.01748487 0.01904376 0.02026432 0.01734417 0.01593625
Median 0.02488883 0.02651838 0.02629004 0.02543797 0.02599134 0.02502503
Sunday
Mean 0.02426048
SD 0.004086276
Max 0.03112314
Min 0.01581155
Median 0.02456262
这个结果几乎我想要的东西,但它需要转置:
> dx <- t(dailyAverages)
> dx
Mean SD Max Min Median
Monday 0.02496614 0.003603139 0.03266055 0.01853659 0.02488883
Tuesday 0.0262649 0.004615455 0.03274712 0.01748487 0.02651838
Wednesday 0.02576256 0.003891674 0.03141136 0.01904376 0.02629004
Thursday 0.02602963 0.004525479 0.03543914 0.02026432 0.02543797
Friday 0.026684 0.00445875 0.03673769 0.01734417 0.02599134
Saturday 0.02440045 0.004779429 0.033213 0.01593625 0.02502503
Sunday 0.02426048 0.004086276 0.03112314 0.01581155 0.02456262
我想知道,如果有更高效,非丑陋的方式来做同样的事情吗?
答案 0 :(得分:4)
您可以尝试dplyr
。链/管道运算符(%>%
)将“lhs”和“rhs”连接在一起。变量“dow”用作分组变量(group_by(..
),使用transmute
计算“convRate”,这将删除现有变量,得到mean
,sd
使用summarise_each
的“convRate”等。 summarise_each
的优点是它可以同时用于多个列。
library(dplyr)
df$dow <- substr(df$dow, 1,3)
res <- df %>%
group_by(dow) %>%
transmute(convRate=goalCompletionsAll/users) %>%
summarise_each(funs(mean, sd, max, min, median), convRate)
indx <- match(c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'), res$dow)
res1 <- res[indx,]