假设我有下一个数据框。我如何创建一个新的“ avg”列,该列是每个组的最后2个日期(“ date”)的平均值。 这个想法是将其应用于具有数十万个文件的数据集,因此性能非常重要。该函数应考虑可变的月份数(例如2个月或3个月),并能够在简单平均值和中等平均值之间进行切换。
谢谢。
table1<-data.frame(group=c(1,1,1,1,2,2,2,2),date=c(201903,201902,201901,201812,201903,201902,201901,201812),price=c(10,30,50,20,2,10,9,20))
group date price
1 1 201903 10
2 1 201902 30
3 1 201901 50
4 1 201812 20
5 2 201903 2
6 2 201902 10
7 2 201901 9
8 2 201812 20
result<-data.frame(group=c(1,1,1,1,2,2,2,2),date=c(201903,201902,201901,201812,201903,201902,201901,201812),price=c(10,30,50,20,2,10,9,20), avg = c(20, 40, 35, NA, 6, 9.5, 14.5, NA))
group date price avg
1 1 201903 10 20.0
2 1 201902 30 40.0
3 1 201901 50 35.0
4 1 201812 20 NA
5 2 201903 2 6.0
6 2 201902 10 9.5
7 2 201901 9 14.5
8 2 201812 20 NA
答案 0 :(得分:1)
如果您的date
列已排序,那么她的方法就是使用data.table
:
library(data.table)
setDT(table1)[, next_price := dplyr::lead(price), by = group][, total_price := price + next_price][, avg := total_price / 2][, c("total_price", "next_price") := NULL]
table1
group date price avg
1: 1 201903 10 20.0
2: 1 201902 30 40.0
3: 1 201901 50 35.0
4: 1 201812 20 NA
5: 2 201903 2 6.0
6: 2 201902 10 9.5
7: 2 201901 9 14.5
8: 2 201812 20 NA
答案 1 :(得分:1)
首先对data.frame进行排序,以便每个组的日期递增
table1 <- table1[order(table1$group, table1$date), ]
创建带有参数月数的移动平均函数。 其他功能选项可从以下位置获得:Calculating moving average
mov_avg <- function(y, months = 2){as.numeric(filter(y, rep(1 / months, months), sides = 1))}
通过此mov_avg
函数使用经典的do.call-lapply-split组合
table1$avg_2months <- do.call(c, lapply(split(x=table1$price, f=table1$group), mov_avg, months=2))
table1$avg_3months <- do.call(c, lapply(split(x=table1$price, f=table1$group), mov_avg, months=3))
table1
group date price avg_2months avg_3months
4 1 201812 20 NA NA
3 1 201901 50 35.0 NA
2 1 201902 30 40.0 33.33333
1 1 201903 10 20.0 30.00000
8 2 201812 20 NA NA
7 2 201901 9 14.5 NA
6 2 201902 10 9.5 13.00000
5 2 201903 2 6.0 7.00000