我有一个像这样的大数据框:
df <- data.frame(id = c('1', '2', '3', '4', '5', '6'), Date = c("01-Feb-17", "05-Feb-17", "03-May-17","24-May-17","20-Oct-17", "25-Oct-17"), Name=c("John", "Jack", "Jack", "John", "John", "Jack"), Workout=c('150', '130', '140', '160', '150', '130'))
如何创建一个新值(Average_Workout),其中包含&#34; Workout&#34;的平均值。自年初以来的每个时期。
例如,
答案 0 :(得分:3)
我们可以在按名称&#39;
分组后使用cummean
library(dplyr)
res <- df %>%
#if not ordered by 'Date'
#arrange(Name, as.Date(Date, "%d-%b-%y")) %>%
group_by(Name) %>%
mutate(Avg = cummean(Workout))
as.data.frame(res)
# id Date Name Workout Avg
#1 1 01-Feb-17 John 150 150.0000
#2 2 05-Feb-17 Jack 130 130.0000
#3 3 03-May-17 Jack 140 135.0000
#4 4 24-May-17 John 160 155.0000
#5 5 20-Oct-17 John 150 153.3333
#6 6 25-Oct-17 Jack 130 133.3333
注意:当我们引用numeric
元素时,它将是character
或factor
类,具体取决于stringAsFactors = FALSE
还是TRUE
df <- data.frame(id = c('1', '2', '3', '4', '5', '6'),
Date = c("01-Feb-17", "05-Feb-17", "03-May-17","24-May-17","20-Oct-17", "25-Oct-17"),
Name=c("John", "Jack", "Jack", "John", "John", "Jack"),
Workout=c(150, 130, 140, 160, 150, 130), stringsAsFactors = FALSE)