我有以下格式的足球成绩数据(数以千计的观察结果):
Div date value pts
1 E0 2011-08-13 Blackburn 0.0
2 E0 2011-08-13 Fulham 0.5
3 E0 2011-08-13 Liverpool 0.5
4 E0 2011-08-13 Newcastle 0.5
5 E0 2011-08-13 QPR 0.0
6 E0 2011-08-13 Wigan 0.5
7 E0 2011-08-14 Stoke 0.5
8 E0 2011-08-14 West Brom 0.0
9 E0 2011-08-15 Man City 1.0
10 E0 2011-08-20 Arsenal 0.0
11 E0 2011-08-20 Aston Villa 1.0
加上其他变量。 “价值”是团队,pts是最终结果(赢/输/抽奖)作为数值。我正在尝试添加一个新变量,该变量是该行中团队最后X场比赛的平均值。如果不使用一些可怕的循环,我该怎么做?
答案 0 :(得分:3)
看看this
使用zoo包和rollmean
以及plyr包ddply
:
library(zoo)
library(plyr)
dat <- data.frame(value=letters[1:5], pts=sample(c(0, 0.5, 1), 50, replace=T))
ddply(dat, .(value), summarise, rollmean(pts, k=5, align='right'))
但是,据我所知,“滚动平均值”会根据定义缩短数据集。你可以提供一个填充参数:
ddply(dat, .(value), summarise, rollmean(pts, k=5, fill=NA, align='right'))
答案 1 :(得分:1)
从统计信息中尝试ave
功能。
Trt <- gl(n=2, k=3, length=2*3, labels =c("A", "B"))
Y <- 1:6
Data <- data.frame(Trt, Y)
Data
Trt Y
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
Data$TrtMean <- ave(Y, Trt, FUN=mean)
Data
Trt Y TrtMean
1 A 1 2
2 A 2 2
3 A 3 2
4 B 4 5
5 B 5 5
6 B 6 5
答案 2 :(得分:1)
使用tapply
可以非常有效地完成此操作。我通过复制团队游戏,随机分数和日期来改变你的数据。这取决于tail
函数中指定的最近2场比赛的平均值。
# create some data
d <- structure(list(Div = structure(rep(1L, 33), .Label = " E0",
class = "factor"), date = structure(c(15013, 14990, 14996, 15001, 14995, 15006,
15020, 15032, 15023, 15022, 15015, 15016, 15034, 14994, 14986, 14998, 14982,
14979, 14980, 15016, 15031, 15013, 15031, 14999, 15025, 14978, 15007, 15026,
14992, 14997, 15023, 14986, 15028), class = "Date"),
value = structure(c(3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L,
7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L,
2L), .Label = c("Arsenal", "Aston Villa", "Blackburn", "Fulham", "Liverpool",
"Man City", "Newcastle", "QPR", "Stoke", "West Brom", "Wigan"),
class = "factor"), pts = c(0.5, 0.5, 0.5, 1, 1, 1, 1, 0, 1, 0.5, 0, 1, 1, 1, 1,
0.5, 0.5, 0, 0.5, 0.5, 0, 0, 0, 1, 0, 0, 0.5, 0, 1, 0, 0.5, 0.5, 0.5)),
.Names = c("Div", "date", "value", "pts"), row.names = c(NA, 33L),
class = "data.frame")
# sort rows by date
d2 <- d[order(d$date),]
# mean of all games
tapply(d2$pts, d2$value, mean)
# mean of last 2 games
tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)))
# To tidy up the output, you could use simplify=FALSE and do.call(rbind, x):
# e.g., mean of last 2 games:
do.call(rbind, tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)),
simplify=F))
[,1]
Arsenal 0.25
Aston Villa 0.25
Blackburn 0.50
Fulham 1.00
Liverpool 0.25
Man City 0.75
Newcastle 1.00
QPR 0.50
Stoke 1.00
West Brom 0.00
Wigan 0.50