我想尝试跨列的NFL数据的3场比赛滚动平均值,这里是数据和结果数据框:
数据:
Player <- c("Player1", "Player2", "Player3", "Player4", "Player5")
Week1 <- c(10, 5, 6, 8, 7)
Week2 <- c(12, 9, 4, 2, 8)
Week3 <- c(4, 5, 4, 3, 12)
Week4 <- c(15, 7, 12, NA, 5)
Week5 <- c(NA, 5, 8, 11, 6)
q <- data.frame(Player, Week1, Week2, Week3, Week4, Week5)
数据框:
Player Week1 Week2 Week3 Week4 Week5
1 Player1 10 12 4 15 NA
2 Player2 5 9 5 7 5
3 Player3 6 4 4 12 8
4 Player4 8 2 3 NA 11
5 Player5 7 8 12 5 6
所以我想要做的是从第1周开始,在整个星期内进行3场比赛的滚动平均值。因此对于球员来说,它会平均为第1周,第2周,第3周并在新列中给出我的值,然后它会平均第2周,第3周,第4周,并在新列等中给我这个值......
在这种情况下,新数据框应如下所示:
Player Week1 Week2 Week3 Week4 Week5 Avg1 Avg2 Avg3
1 Player1 10 12 4 15 NA 8.7 10.3 NA
2 Player2 5 9 5 7 5 6.3 7.0 5.7
3 Player3 6 4 4 12 8 4.7 6.7 8.0
4 Player4 8 2 3 NA 11 4.3 4.3 5.3
5 Player5 7 8 12 5 6 9.0 8.3 7.7
请注意,对于第4周的Player4,有一个被忽视的NA ...这将是玩家由于某种原因没有玩的一周,所以我使用前两个游戏和之后的游戏Avg3。
我需要这些新列,因为我将运行回归以查看3的平均值是否预测下一个值。我能找到的所有关于这一点的滚动平均值只有一列,但我很缺乏经验,所以对于像这样的问题的数据格式化的任何帮助都表示赞赏。在此先感谢您的帮助!
答案 0 :(得分:2)
我们可以使用void swap_node(struct node **head_ref,int key1,int key2) // function to swap two nodes.
{
if(key1==key2)
return;
// search for key1
struct node *prevx = NULL, *currx = *head_ref;
while(currx && currx->data != key1)
{
prevx = currx;
currx = currx->next;
}
//search for key2
struct node *prevy = NULL, *curry = *head_ref;
while(curry && curry->data!=key2)
{
prevy = curry;
curry = curry->next;
}
// if key1 or key2 are not present in the list
if(currx == NULL || curry == NULL)
return;
// check if key1 is not head of the list
if(prevx != NULL)
prevx->next = curry;
else
*head_ref = curry; // then make key2 the head
// check if key2 is not head of the list
if(prevy != NULL)
prevy->next = currx;
else
*head_ref = currx; // then make key2 the head
// swapping the next pointers of the nodes
struct node *temp = curry->next;
curry->next = currx->next;
currx->next = temp;
}
包
rollmean
zoo
最后,要获得合并的数据框,
library(zoo)
t(apply(q[-1], 1, function(x) rollmean(x, 3))))
# Week2 Week3 Week4
#[1,] 8.666667 10.333333 NA
#[2,] 6.333333 7.000000 5.666667
#[3,] 4.666667 6.666667 8.000000
#[4,] 4.333333 NA NA
#[5,] 9.000000 8.333333 7.666667
如果您具体了解列名,可以随时通过
进行更改cbind(q, t(apply(q[-1], 1, function(x) rollmean(x, 3))))
# Player Week1 Week2 Week3 Week4 Week5 Week2 Week3 Week4
#1 Player1 10 12 4 15 NA 8.666667 10.333333 NA
#2 Player2 5 9 5 7 5 6.333333 7.000000 5.666667
#3 Player3 6 4 4 12 8 4.666667 6.666667 8.000000
#4 Player4 8 2 3 NA 11 4.333333 NA NA
#5 Player5 7 8 12 5 6 9.000000 8.333333 7.666667
然后在temp <- t(apply(q[-1], 1, function(x) rollmean(x, 3)))
colnames(temp) <- c("avg1", "avg2", "avg3")
cbind
修改强>
回答OP的一些问题 -
如果您在开始时要删除多个列,则只需选择/取消选择具有索引编号的列
例如,
要取消选择前两列,您可以使用等于temp
的{{1}},为其提供一系列取消选择/选择的值。
q[-c(1:2)]
被称为匿名函数,您可以使用它将自己的函数应用于数据框的每一行。
q[3:7]
无法处理function(x)
个值。来自rollmean
rollmean的默认方法不处理包含NA的输入。在这种情况下,请改用rollapply。