DT:
Hteam Ateam Season HT_Points AT_Points
Grodig Salzburg 2015/2016 23 29
Rapid Vienna Altach 2015/2016 38 15
Ried Austria Vienna 2015/2016 32 30
Sturm Graz Mattersburg 2015/2016 30 17
Admira Rapid Vienna 2015/2016 24 27
Altach Ried 2015/2016 25 10
Austria Vienna Sturm Graz 2015/2016 29 18
Mattersburg Grodig 2015/2016 22 12
Salzburg AC Wolfsberger 2015/2016 45 11
Rapid Vienna Ried 2016/2017 3 0
Altach AC Wolfsberger 2016/2017 3 0
Sturm Graz Salzburg 2016/2017 3 0
St. Polten Austria Vienna 2016/2017 0 3
Mattersburg Admira 2016/2017 0 3
Salzburg AC Wolfsberger 2016/2017 1 1
Ried Sturm Graz 2016/2017 3 0
Altach Rapid Vienna 2016/2017 6 0
Austria Vienna Mattersburg 2016/2017 3 0
所需的输出:
Hteam Ateam Season HT_Points AT_Points HT_PointsTOTAL AT_PointsTOTAL
Grodig Salzburg 2015/2016 23 29 23 + ? 29 + ?
Rapid Vienna Altach 2015/2016 38 15 38 + ? 15 + ?
Ried Austria Vienna 2015/2016 32 30 32 + ? 30 + ?
Sturm Graz Mattersburg 2015/2016 30 17 30 + ? 17 + ?
Admira Rapid Vienna 2015/2016 24 27 24 + ? 65
Altach Ried 2015/2016 25 10 40 42
Austria Vienna Sturm Graz 2015/2016 29 18 59 48
Mattersburg Grodig 2015/2016 22 12 39 35
Salzburg AC Wolfsberger 2015/2016 45 11 74 11 + ?
Rapid Vienna Ried 2016/2017 3 0 NA NA
Altach AC Wolfsberger 2016/2017 3 0 NA NA
Sturm Graz Salzburg 2016/2017 3 0 NA NA
St. Polten Austria Vienna 2016/2017 0 3 NA NA
Mattersburg Admira 2016/2017 0 3 NA NA
Salzburg AC Wolfsberger 2016/2017 1 1 1 NA
Ried Sturm Graz 2016/2017 3 0 3 3
Altach Rapid Vienna 2016/2017 6 0 NA 3
Austria Vienna Mattersburg 2016/2017 3 0 6 0
HT_PointsTOTAL = HT_Points + AT_Points(last game played as Ateam by Hteam)
AT_PointsTOTAL = AT_Points + HT_Points(last game played as Hteam by Ateam)
Note: ? --> It should be a number.
It has been put like this since the rows it refers to are not shown.
NA --> No previous game on that Season by Hteam as Ateam or by Ateam as Hteam.
我知道从上一行可以使用shift查找值的内容。但是在这种情况下,我不知道该怎么做,因为团队的名称是相同的,但是在不同的列(Hteam和Ateam)中。
也许轮班做不到我想做的事。目标是增加团队的总积分。也就是说,在主场比赛时,您必须从团队上一次作为访客打球的时候寻找要点,并将其添加(反之亦然)。
也许唯一的解决方案是使用一个函数来创建新列。但是我不知道该怎么做。
有必要使用“季节”列进行分组。
如果可以使用data.table包。
答案 0 :(得分:1)
这是一种使用data.table
非等分联接的方法,该联接使用行号来确保我们仅从前几行中进行选择:
library(data.table)
setDT(DT)
DT[, rn := .I]
#calculate home team points first
DT[, HT_PointsTotal :=
.SD[.SD, .(x.AT_Points + i.HT_Points), on=c("Season"="Season", "Ateam"="Hteam", "rn<rn")]]
#then calculate away team points
DT[, AT_PointsTotal :=
.SD[.SD, .(x.HT_Points + i.AT_Points), on=c("Season"="Season", "Hteam"="Ateam", "rn<rn")]]
当数据集变大并且由于Hteam在Ateam列中多次出现而导致笛卡尔联接错误时,添加一种roll
方法。
dummy[, rn := .I]
dummy[, HT_PointsTotal :=
.SD[.SD, .(x.AT_Points + i.HT_Points), on=c("Season", "Ateam"="Hteam", "rn"), roll=Inf]
]
dummy[, AT_PointsTotal :=
.SD[.SD, .(x.HT_Points + i.AT_Points), on=c("Season", "Ateam"="Hteam", "rn"), roll=Inf]
]
虚拟数据(创建数据很耗时,并且也无法反映现实):
library(data.table)
DT <- fread("Hteam,Ateam,Season,HT_Points,AT_Points
Grodig,Salzburg,2015/2016,23,29
Rapid Vienna,Altach,2015/2016,38,15
Ried,Austria Vienna,2015/2016,32,30
Sturm Graz,Mattersburg,2015/2016,30,17
Admira,Rapid Vienna,2015/2016,24,27
Altach,Ried,2015/2016,25,10
Austria Vienna,Sturm Graz,2015/2016,29,18
Mattersburg,Grodig,2015/2016,22,12
Salzburg,AC Wolfsberger,2015/2016,45,11")
numTeams <- DT[,uniqueN(c(Hteam, Ateam))]
firstHalf <- lapply(seq_len(DT[,.N]),
function(n) data.table(
Matchday=n*2L-1L,
Hteam=DT[["Hteam"]],
Ateam=c(DT[["Ateam"]][-seq_len(n)], DT[["Ateam"]][seq_len(n)]),
Season=DT[["Season"]],
HT_Points=DT[["HT_Points"]],
AT_Points=DT[["AT_Points"]]
))
secondHalf <- lapply(seq_len(DT[,.N]),
function(n) data.table(
Matchday=n*2L,
Hteam=DT[["Ateam"]],
Ateam=c(DT[["Hteam"]][-seq_len(n)], DT[["Hteam"]][seq_len(n)]),
Season=DT[["Season"]],
HT_Points=DT[["HT_Points"]],
AT_Points=DT[["AT_Points"]]
))
dummy <- rbindlist(c(firstHalf, secondHalf))[
Hteam!=Ateam][,
.SD[1L], by=.(Hteam, Ateam)]
setorder(dummy, Matchday, Hteam)
答案 1 :(得分:0)
已更新
与带有data.table
的@ chinsoon12非常相似,但避免使用.SD
,因此IMO更加简洁:
library(data.table)
setDT(DT)
DT[, rn := .I]
# Join to get away points (i.AT_Points) for Hteam
DT[DT,
HT_PointsTOTAL := HT_Points + i.AT_Points,
on = .(Hteam=Ateam, Season=Season, rn>rn)] # note rn>rn (using greater than here)
# Join to get home points (i.HT_Points) for Ateam
DT[DT,
AT_PointsTOTAL := AT_Points + i.HT_Points,
on = .(Ateam=Hteam, Season=Season, rn<rn)] # note nr<rn (using less than here)
DT
产生(用于更新的样本数据):
Hteam Ateam Season HT_Points AT_Points rn HT_PointsTOTAL AT_PointsTOTAL
1: Grodig Salzburg 2015/2016 23 29 1 NA NA
2: Rapid Vienna Altach 2015/2016 38 15 2 NA NA
3: Ried Austria Vienna 2015/2016 32 30 3 NA NA
4: Sturm Graz Mattersburg 2015/2016 30 17 4 NA NA
5: Admira Rapid Vienna 2015/2016 24 27 5 NA 65
6: Altach Ried 2015/2016 25 10 6 40 42
7: Austria Vienna Sturm Graz 2015/2016 29 18 7 59 48
8: Mattersburg Grodig 2015/2016 22 12 8 39 35
9: Salzburg AC Wolfsberger 2015/2016 45 11 9 74 NA
10: Rapid Vienna Ried 2016/2017 3 0 10 NA NA
11: Altach AC Wolfsberger 2016/2017 3 0 11 NA NA
12: Sturm Graz Salzburg 2016/2017 3 0 12 NA NA
13: St. Polten Austria Vienna 2016/2017 0 3 13 NA NA
14: Mattersburg Admira 2016/2017 0 3 14 NA NA
15: Salzburg AC Wolfsberger 2016/2017 1 1 15 1 NA
16: Ried Sturm Graz 2016/2017 3 0 16 3 3
17: Altach Rapid Vienna 2016/2017 6 0 17 NA 3
18: Austria Vienna Mattersburg 2016/2017 3 0 18 6 0
如果愿意,可以在完成后删除rn列:
DT$rn <- NULL