我有一个由以下描述的数据集:
> dput(droplevels(head(sample,10)))
structure(list(Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Air-Force", class = "factor"), Year = c(2003L,
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2011L, 2012L, 2013L
), Grouped_Position_3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Skill", class = "factor"), Avg_Rating = c(0.7667,
0, 0.7444, 0.7222, 0, 0.7556, 0.76224, 0.596322222222222, 0.706584615384615,
0.767509090909091), n = c(1L, 1L, 3L, 6L, 1L, 1L, 5L, 9L, 13L,
11L)), .Names = c("Team", "Year", "Grouped_Position_3", "Avg_Rating",
"n"), row.names = c(NA, 10L), class = "data.frame")
在完整数据集中,有多个学校,分组位置和年份。我想要做的是弄清楚如何使用当前年份和过去四年为每个独特的学校,年份和职位组生成滚动平均值。例如2013年空军和技能位置我希望进行以下计算(请注意,数据中缺少2010年):
(。767 + 0.70 + 0.59 + 0 + 0.762)/ 5
0来自失踪的一年。我已经看过动物园图书馆和dplyr,但我还没有能够控制像这样的缺失值。我是不是要编写一个循环,或者r中是否有一些具有此功能的包?
答案 0 :(得分:3)
创建一个函数Avg
,其中给出行号ix
的向量,取最近5年所需的平均值,然后rollapplyr
为每组Team
和Grouped_Position_3
:
library(zoo)
Avg <- function(ix) with(sample[ix, ], sum(Avg_Rating[Year >= max(Year) - 4]) / 5)
transform(sample, Avg = ave(1:nrow(sample), Team, Grouped_Position_3, FUN =
function(ix) rollapplyr(ix, 5, Avg, partial = TRUE)))
,并提供:
Team Year Grouped_Position_3 Avg_Rating n Avg
1 Air-Force 2003 Skill 0.7667000 1 0.1533400
2 Air-Force 2004 Skill 0.0000000 1 0.1533400
3 Air-Force 2005 Skill 0.7444000 3 0.3022200
4 Air-Force 2006 Skill 0.7222000 6 0.4466600
5 Air-Force 2007 Skill 0.0000000 1 0.4466600
6 Air-Force 2008 Skill 0.7556000 1 0.4444400
7 Air-Force 2009 Skill 0.7622400 5 0.5968880
8 Air-Force 2011 Skill 0.5963222 9 0.4228324
9 Air-Force 2012 Skill 0.7065846 13 0.5641494
10 Air-Force 2013 Skill 0.7675091 11 0.5665312
使用的输入是:
sample <- structure(list(Team = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Air-Force", class = "factor"), Year = c(2003L,
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2011L, 2012L, 2013L
), Grouped_Position_3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Skill", class = "factor"), Avg_Rating = c(0.7667,
0, 0.7444, 0.7222, 0, 0.7556, 0.76224, 0.596322222222222, 0.706584615384615,
0.767509090909091), n = c(1L, 1L, 3L, 6L, 1L, 1L, 5L, 9L, 13L,
11L)), .Names = c("Team", "Year", "Grouped_Position_3", "Avg_Rating",
"n"), row.names = c(NA, 10L), class = "data.frame")