是否可以按两个不同的列对前一个行值进行分组?

时间:2018-09-04 19:19:19

标签: r data.table

DT:

Hteam            Ateam          Season      HT_Points   AT_Points
Grodig          Salzburg        2015/2016       23          29
Rapid Vienna    Altach          2015/2016       38          15
Ried            Austria Vienna  2015/2016       32          30
Sturm Graz      Mattersburg     2015/2016       30          17
Admira          Rapid Vienna    2015/2016       24          27
Altach          Ried            2015/2016       25          10
Austria Vienna  Sturm Graz      2015/2016       29          18
Mattersburg     Grodig          2015/2016       22          12
Salzburg        AC Wolfsberger  2015/2016       45          11
Rapid Vienna    Ried            2016/2017        3           0
Altach          AC Wolfsberger  2016/2017        3           0
Sturm Graz      Salzburg        2016/2017        3           0
St. Polten      Austria Vienna  2016/2017        0           3
Mattersburg     Admira          2016/2017        0           3
Salzburg        AC Wolfsberger  2016/2017        1           1
Ried            Sturm Graz      2016/2017        3           0
Altach          Rapid Vienna    2016/2017        6           0
Austria Vienna  Mattersburg     2016/2017        3           0

所需的输出:

Hteam            Ateam          Season      HT_Points   AT_Points HT_PointsTOTAL    AT_PointsTOTAL
Grodig          Salzburg        2015/2016       23          29           23 + ?         29 + ?
Rapid Vienna    Altach          2015/2016       38          15           38 + ?         15 + ? 
Ried            Austria Vienna  2015/2016       32          30           32 + ?         30 + ?
Sturm Graz      Mattersburg     2015/2016       30          17           30 + ?         17 + ?
Admira          Rapid Vienna    2015/2016       24          27           24 + ?            65
Altach          Ried            2015/2016       25          10             40              42
Austria Vienna  Sturm Graz      2015/2016       29          18             59              48
Mattersburg     Grodig          2015/2016       22          12             39              35
Salzburg        AC Wolfsberger  2015/2016       45          11             74           11 + ?
Rapid Vienna    Ried            2016/2017        3           0             NA              NA
Altach          AC Wolfsberger  2016/2017        3           0             NA              NA
Sturm Graz      Salzburg        2016/2017        3           0             NA              NA
St. Polten      Austria Vienna  2016/2017        0           3             NA              NA
Mattersburg     Admira          2016/2017        0           3             NA              NA
Salzburg        AC Wolfsberger  2016/2017        1           1              1              NA
Ried            Sturm Graz      2016/2017        3           0              3               3
Altach          Rapid Vienna    2016/2017        6           0             NA               3
Austria Vienna  Mattersburg     2016/2017        3           0              6               0

HT_PointsTOTAL = HT_Points + AT_Points(last game played as Ateam by Hteam)
AT_PointsTOTAL = AT_Points + HT_Points(last game played as Hteam by Ateam)

Note: ? --> It should be a number. 
            It has been put like this since the rows it refers to are not shown.
     NA --> No previous game on that Season by Hteam as Ateam or by Ateam as Hteam.

我知道从上一行可以使用shift查找值的内容。但是在这种情况下,我不知道该怎么做,因为团队的名称是相同的,但是在不同的列(Hteam和Ateam)中。

也许轮班做不到我想做的事。目标是增加团队的总积分。也就是说,在主场比赛时,您必须从团队上一次作为访客打球的时候寻找要点,并将其添加(反之亦然)。

也许唯一的解决方案是使用一个函数来创建新列。但是我不知道该怎么做。

有必要使用“季节”列进行分组。

如果可以使用data.table包。

2 个答案:

答案 0 :(得分:1)

这是一种使用data.table非等分联接的方法,该联接使用行号来确​​保我们仅从前几行中进行选择:

library(data.table)
setDT(DT)
DT[, rn := .I]

#calculate home team points first
DT[, HT_PointsTotal := 
    .SD[.SD, .(x.AT_Points + i.HT_Points), on=c("Season"="Season", "Ateam"="Hteam", "rn<rn")]]

#then calculate away team points
DT[, AT_PointsTotal := 
    .SD[.SD, .(x.HT_Points + i.AT_Points), on=c("Season"="Season", "Hteam"="Ateam", "rn<rn")]]

当数据集变大并且由于Hteam在Ateam列中多次出现而导致笛卡尔联接错误时,添加一种roll方法。

dummy[, rn := .I]
dummy[, HT_PointsTotal :=
        .SD[.SD, .(x.AT_Points + i.HT_Points), on=c("Season", "Ateam"="Hteam", "rn"), roll=Inf]
    ]

dummy[, AT_PointsTotal :=
        .SD[.SD, .(x.HT_Points + i.AT_Points), on=c("Season", "Ateam"="Hteam", "rn"), roll=Inf]
    ]

虚拟数据(创建数据很耗时,并且也无法反映现实):

library(data.table)
DT <- fread("Hteam,Ateam,Season,HT_Points,AT_Points
Grodig,Salzburg,2015/2016,23,29
Rapid Vienna,Altach,2015/2016,38,15
Ried,Austria Vienna,2015/2016,32,30
Sturm Graz,Mattersburg,2015/2016,30,17
Admira,Rapid Vienna,2015/2016,24,27
Altach,Ried,2015/2016,25,10
Austria Vienna,Sturm Graz,2015/2016,29,18
Mattersburg,Grodig,2015/2016,22,12
Salzburg,AC Wolfsberger,2015/2016,45,11")

numTeams <- DT[,uniqueN(c(Hteam, Ateam))]

firstHalf <- lapply(seq_len(DT[,.N]),
    function(n) data.table(
        Matchday=n*2L-1L,
        Hteam=DT[["Hteam"]],
        Ateam=c(DT[["Ateam"]][-seq_len(n)], DT[["Ateam"]][seq_len(n)]),
        Season=DT[["Season"]],
        HT_Points=DT[["HT_Points"]],
        AT_Points=DT[["AT_Points"]]
    ))

secondHalf <- lapply(seq_len(DT[,.N]),
    function(n) data.table(
        Matchday=n*2L,
        Hteam=DT[["Ateam"]],
        Ateam=c(DT[["Hteam"]][-seq_len(n)], DT[["Hteam"]][seq_len(n)]),
        Season=DT[["Season"]],
        HT_Points=DT[["HT_Points"]],
        AT_Points=DT[["AT_Points"]]
    ))


dummy <- rbindlist(c(firstHalf, secondHalf))[
    Hteam!=Ateam][,
        .SD[1L], by=.(Hteam, Ateam)]
setorder(dummy, Matchday, Hteam)

答案 1 :(得分:0)

已更新
与带有data.table的@ chinsoon12非常相似,但避免使用.SD,因此IMO更加简洁:

library(data.table)
setDT(DT)

DT[, rn := .I]

# Join to get away points (i.AT_Points) for Hteam
DT[DT, 
   HT_PointsTOTAL := HT_Points + i.AT_Points,
   on = .(Hteam=Ateam, Season=Season, rn>rn)]  # note rn>rn (using greater than here)

# Join to get home points (i.HT_Points) for Ateam
DT[DT, 
   AT_PointsTOTAL := AT_Points + i.HT_Points,
   on = .(Ateam=Hteam, Season=Season, rn<rn)]  # note nr<rn (using less than here)

DT

产生(用于更新的样本数据):

             Hteam          Ateam    Season HT_Points AT_Points rn HT_PointsTOTAL AT_PointsTOTAL
 1:         Grodig       Salzburg 2015/2016        23        29  1             NA             NA
 2:   Rapid Vienna         Altach 2015/2016        38        15  2             NA             NA
 3:           Ried Austria Vienna 2015/2016        32        30  3             NA             NA
 4:     Sturm Graz    Mattersburg 2015/2016        30        17  4             NA             NA
 5:         Admira   Rapid Vienna 2015/2016        24        27  5             NA             65
 6:         Altach           Ried 2015/2016        25        10  6             40             42
 7: Austria Vienna     Sturm Graz 2015/2016        29        18  7             59             48
 8:    Mattersburg         Grodig 2015/2016        22        12  8             39             35
 9:       Salzburg AC Wolfsberger 2015/2016        45        11  9             74             NA
10:   Rapid Vienna           Ried 2016/2017         3         0 10             NA             NA
11:         Altach AC Wolfsberger 2016/2017         3         0 11             NA             NA
12:     Sturm Graz       Salzburg 2016/2017         3         0 12             NA             NA
13:     St. Polten Austria Vienna 2016/2017         0         3 13             NA             NA
14:    Mattersburg         Admira 2016/2017         0         3 14             NA             NA
15:       Salzburg AC Wolfsberger 2016/2017         1         1 15              1             NA
16:           Ried     Sturm Graz 2016/2017         3         0 16              3              3
17:         Altach   Rapid Vienna 2016/2017         6         0 17             NA              3
18: Austria Vienna    Mattersburg 2016/2017         3         0 18              6              0

如果愿意,可以在完成后删除rn列:

DT$rn <- NULL