在r中计算基于列中日期的平均值

时间:2018-04-20 20:38:55

标签: r

我正在尝试根据列中的日期计算平均值。可以选择前几天的数量,例如4天。获取前4个记录的平均值减去StartDate并将平均值向下滚动,直到有EndDate。

我正在尝试

tapply(df$Boe, df$ShutinDate, function(x) mean(tail(sort(x), 5)))

功能,但我得不到正确的平均值。

输出

Name    DATE    Values  StartDate   EndDate    Average
TestA   3/3/2017    50          
TestA   3/4/2017    75          
TestA   3/5/2017    25          
TestA   3/6/2017    100         
TestA   3/7/2017    100         
TestA   3/8/2017    50          
TestA   3/9/2017    80          
TestA   3/10/2017   90          
TestA   3/11/2017   25             3/11/2017        
TestA   3/12/2017   0                           80
TestA   3/13/2017   0                           80
TestA   3/14/2017   0                           80
TestA   3/15/2017   0                           80
TestA   3/16/2017   50      3/16/2017   

2 个答案:

答案 0 :(得分:1)

1)我们按名称分组(假设rollapply应为每个Name单独完成),然后将width = list(-seq(4))rollapply一起使用对mean的每个应用程序使用偏移-1,-2,-3,-4。 (偏移0将是当前点,但我们希望此前有4个。)

不清楚您所指的关于开始时间的内容,因此该部分已被遗漏。此外,我假设数据已排序(问题中的情况)。您可能还希望将日期转换为"Date"类,但如果行已经排序,则不需要回答该问题。

library(zoo)

roll <- function(x) rollapply(x, list(-seq(4)), mean, fill = NA)
transform(DF, Average = ave(Values, Name, FUN = roll))

2)或者如果你喜欢dplyr然后使用上面的roll

library(dplyr)
library(zoo)

DF %>% 
   group_by(Name) %>% 
   mutate(Average = roll(Values)) %>% 
   ungroup()

答案 1 :(得分:0)

选项是将zoo::rollapplydplyr::lag一起使用:

library(dplyr)
library(lubridate)
library(zoo)

df %>% mutate(DATE = mdy(DATE)) %>%   #Convert to Date
  arrange(Name, DATE) %>%             #Order on Name and DATE
  mutate(Avg = rollapply(Values, 4, mean, fill= NA, align = "right")) %>%
  mutate(Average = lag(Avg)) %>%      # This shows mean for previous 4 rows
  select(-Avg)
#     Name       DATE Values Average
# 1  TestA 2017-03-03     50      NA
# 2  TestA 2017-03-04     75      NA
# 3  TestA 2017-03-05     25      NA
# 4  TestA 2017-03-06    100      NA
# 5  TestA 2017-03-07    100   62.50
# 6  TestA 2017-03-08     50   75.00
# 7  TestA 2017-03-09     80   68.75
# 8  TestA 2017-03-10     90   82.50
# 9  TestA 2017-03-11     25   80.00
# 10 TestA 2017-03-12      0   61.25
# 11 TestA 2017-03-13      0   48.75
# 12 TestA 2017-03-14      0   28.75
# 13 TestA 2017-03-15      0    6.25
# 14 TestA 2017-03-16     50    0.00

数据:

df <- read.table(text = 
"Name    DATE    Values  
TestA   '3/3/2017'    50          
TestA   '3/4/2017'    75          
TestA   '3/5/2017'    25          
TestA   '3/6/2017'    100         
TestA   '3/7/2017'    100         
TestA   '3/8/2017'    50          
TestA   '3/9/2017'    80          
TestA   '3/10/2017'   90          
TestA   '3/11/2017'   25  
TestA   '3/12/2017'   0   
TestA   '3/13/2017'   0   
TestA   '3/14/2017'   0   
TestA   '3/15/2017'   0   
TestA   '3/16/2017'   50",
header = TRUE, stringsAsFactors = FALSE)