我列出了每周每笔儿童的个人最佳时间。对于孩子A,我如何计算前几个星期孩子的平均收入,不包括本周的结果呢? 例如,一个孩子的结果可能如下所示:
df.trial <- data.frame(Week= c("w10", "w9", "w9", "w5", "w5", "w5", "w6", "w6", "w3"), Stroke= c("Fly","Free","Breast","Back","Free","Breast","Fly","Back","Free"), Score = c(5.5,4.5,4.6,5.2, 4.3, 5.7, 4.7,5.5,4.8))
我正在尝试添加一个新列来计算如下平均值:
df.desired <- data.frame(Week= c("w10", "w9", "w9", "w5", "w5", "w5", "w6", "w6", "w3"), Stroke= c("Fly","Free","Breast","Back","Free","Breast","Fly","Back","Free"), Score = c(5.5, 4.5, 4.6, 5.2, 4.3, 5.7, 4.7, 5.5, 4.8), prev.mean = c(4.91, 5.03, 5.03, 4.80, 4.80, 4.80, 5.00, 5.00, NA))
我的数据有很多孩子,所以我需要按每个学生的姓名分组。另外,如果没有以前的几周,那么我想添加一个NA。
我尝试了几种不同的方式,并尝试编写类似rollsum的函数,如下所述:R previous week total
到目前为止没有运气。有什么建议吗?
答案 0 :(得分:0)
编辑:新答案
我误解了您的问题,我认为这就是您想要的。 请注意,我在您的data.frame中添加了一个kid_id列,以向您显示适用于多个ID的解决方案。
还请注意,我必须将Weeks转换为数字,以便可以安排Weeks,否则安排将不正确。
library(tidyverse)
df.trial <- data.frame(Kid_id = 1, Week= c("w10", "w9", "w9", "w5", "w5", "w5", "w6", "w6", "w3"), Stroke= c("Fly","Free","Breast","Back","Free","Breast","Fly","Back","Free"), Score = c(5.5,4.5,4.6,5.2, 4.3, 5.7, 4.7,5.5,4.8))
df.trial %>% as_tibble() %>%
mutate(Week = as.numeric(gsub("w", "", as.character(Week))),
kid_id = 1) %>%
group_by(kid_id, Week) %>%
arrange(kid_id, Week) %>%
summarise(sum_score = sum(Score),
n_score = n()) %>%
mutate(prev.mean = map_dbl(seq_along(sum_score), ~sum(sum_score[1:.x - 1]) / sum(n_score[1:.x - 1])),
Week = paste0("w", Week)) %>%
ungroup() %>%
select(kid_id, Week, prev.mean) %>%
left_join(df.trial, ., by = "Week")
Week Stroke Score kid_id prev.mean
1 w10 Fly 5.5 1 4.912500
2 w9 Free 4.5 1 5.033333
3 w9 Breast 4.6 1 5.033333
4 w5 Back 5.2 1 4.800000
5 w5 Free 4.3 1 4.800000
6 w5 Breast 5.7 1 4.800000
7 w6 Fly 4.7 1 5.000000
8 w6 Back 5.5 1 5.000000
9 w3 Free 4.8 1 NaN
在这里您拥有NaN
而不是NA
,但可以根据需要轻松地替换它们。