计算缺少时间值的滚动平均值

时间:2018-03-01 18:36:16

标签: r for-loop dplyr calculated-columns zoo

我有一个由以下描述的数据集:

> dput(droplevels(head(sample,10)))
structure(list(Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Air-Force", class = "factor"), Year = c(2003L, 
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2011L, 2012L, 2013L
), Grouped_Position_3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Skill", class = "factor"), Avg_Rating = c(0.7667, 
0, 0.7444, 0.7222, 0, 0.7556, 0.76224, 0.596322222222222, 0.706584615384615, 
0.767509090909091), n = c(1L, 1L, 3L, 6L, 1L, 1L, 5L, 9L, 13L, 
11L)), .Names = c("Team", "Year", "Grouped_Position_3", "Avg_Rating", 
"n"), row.names = c(NA, 10L), class = "data.frame")

在完整数据集中,有多个学校,分组位置和年份。我想要做的是弄清楚如何使用当前年份和过去四年为每个独特的学校,年份和职位组生成滚动平均值。例如2013年空军和技能位置我希望进行以下计算(请注意,数据中缺少2010年):

(。767 + 0.70 + 0.59 + 0 + 0.762)/ 5

0来自失踪的一年。我已经看过动物园图书馆和dplyr,但我还没有能够控制像这样的缺失值。我是不是要编写一个循环,或者r中是否有一些具有此功能的包?

1 个答案:

答案 0 :(得分:3)

创建一个函数Avg,其中给出行号ix的向量,取最近5年所需的平均值,然后rollapplyr为每组TeamGrouped_Position_3

library(zoo)

Avg <- function(ix) with(sample[ix, ], sum(Avg_Rating[Year >= max(Year) - 4]) / 5)
transform(sample, Avg = ave(1:nrow(sample), Team, Grouped_Position_3, FUN = 
   function(ix) rollapplyr(ix, 5, Avg, partial = TRUE)))

,并提供:

        Team Year Grouped_Position_3 Avg_Rating  n       Avg
1  Air-Force 2003              Skill  0.7667000  1 0.1533400
2  Air-Force 2004              Skill  0.0000000  1 0.1533400
3  Air-Force 2005              Skill  0.7444000  3 0.3022200
4  Air-Force 2006              Skill  0.7222000  6 0.4466600
5  Air-Force 2007              Skill  0.0000000  1 0.4466600
6  Air-Force 2008              Skill  0.7556000  1 0.4444400
7  Air-Force 2009              Skill  0.7622400  5 0.5968880
8  Air-Force 2011              Skill  0.5963222  9 0.4228324
9  Air-Force 2012              Skill  0.7065846 13 0.5641494
10 Air-Force 2013              Skill  0.7675091 11 0.5665312

注意

使用的输入是:

sample <- structure(list(Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Air-Force", class = "factor"), Year = c(2003L, 
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2011L, 2012L, 2013L
), Grouped_Position_3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "Skill", class = "factor"), Avg_Rating = c(0.7667, 
0, 0.7444, 0.7222, 0, 0.7556, 0.76224, 0.596322222222222, 0.706584615384615, 
0.767509090909091), n = c(1L, 1L, 3L, 6L, 1L, 1L, 5L, 9L, 13L, 
11L)), .Names = c("Team", "Year", "Grouped_Position_3", "Avg_Rating", 
"n"), row.names = c(NA, 10L), class = "data.frame")