下午好 我正在尝试用“扭曲”创建一个累积均值-我只想对当前日期之前的字段取平均值(可能有相同日期的字段)
我成功使用几个自定义创建的函数以“肮脏的方式”完成了此操作,但是它花费的时间太长,而且效率很低-我很确定有更好的方法。
我正在考虑以下方面的事情:
averages <- DB %>% group_by(field1,field2) %>% mutate(Avg=cummean(???*value1)))
我如何访问cummean
函数的当前观测值
我走的路是为每个带有子集的子集创建一个逻辑向量
for (i in 1:length(datevector)-1)
logicalvector[i] <- datevector[length(datevector)]>datevector[i]
logicalvector[length(datevector)]=F
并在另一个函数中使用它来计算均值
一个简单的例子是:
df <- data.frame(id=1:5,Date=as.Date(c("2013-08-02","2013-08-02","2013-08-03","2013-08-03","2013-08-04")),Value=c(1,4,5,2,4))
id Date Value accum mean
1 02/08/2013 1 0
2 02/08/2013 4 0
3 03/08/2013 5 2.5
4 03/08/2013 2 2.5
5 04/08/2013 4 3
Explanation:
there are no observation with a prior date for the first 2 observations so the mean is 0
the 3rd observation averages the 1st and 2nd, so does the 4th.
the 5th observation averages all
答案 0 :(得分:2)
这可以实现为SQL中的复杂自连接。这会将每行平均Date
小于Value
的所有行连接到每一行。在平均值为Null的情况下,coalesce
用于分配0。
library(sqldf)
sqldf("select a.*, coalesce(avg(b.Value), 0) as mean
from df as a
left join df as b on b.Date < a.Date
group by a.rowid")
给予:
id Date Value mean
1 1 2013-08-02 1 0.0
2 2 2013-08-02 4 0.0
3 3 2013-08-03 5 2.5
4 4 2013-08-03 2 2.5
5 5 2013-08-04 4 3.0
答案 1 :(得分:1)
使用data.table
和lubridate
,您可以选择以下选项:
library(data.table)
library(lubridate)
dt <- data.table(id=c(1:5))
dt$Date <- c("02/08/2013", "02/08/2013", "03/08/2013", "03/08/2013", "04/08/2013")
dt$Value <- c(1,4,5,2,4)
dt$Date <- dmy(dt$Date)
cummean <- function(d){
if(nrow(dt[Date<d])>0)
dt[Date<d, sum(Value)/.N]
else 0
}
dt[, accuMean:=mapply(cummean,Date)]
# id Date Value accuMean
#1: 1 2013-08-02 1 0.0
#2: 2 2013-08-02 4 0.0
#3: 3 2013-08-03 5 2.5
#4: 4 2013-08-03 2 2.5
#5: 5 2013-08-04 4 3.0
具有多个值时的解决方案:
library(data.table)
library(lubridate)
dt <- data.table(id=c(1:5))
dt$Date <- c("02/08/2013", "02/08/2013", "03/08/2013", "03/08/2013", "04/08/2013")
dt$Value_1 <- c(1,4,5,2,4)
dt$Value_2 <- c(3,2,0,1,2)
dt$Value_3 <- c(4,9,3,3,3)
dt$Date <- dmy(dt$Date)
cummean <- function(d,Value){
if(nrow(dt[Date<d])>0)
sum(dt[Date<d, Value, with=F])/dt[Date<d, .N]
else 0
}
n <- 3
accuMean <- paste0("accuMean_", (1:n))
for(i in 1:n){
print(i)
dt[, (accuMean[i]):=mapply(cummean,Date,MoreArgs = list(paste0("Value_",i)))]
}
假设您有n个名为Value_i的值。在您的情况下为十,只需设置n = 10