组合与匹配

时间:2016-02-08 02:36:23

标签: r group-by matching fill

我的数据集的小代表:

TEAM1 <- c("ATL", "CHI", "CLE", "DET", "GSW", "NOP", "BKN","ATL", "PHI","CHI")
HOME.AWAY <- c("vs.", "vs.", "@", "@", "vs.", "@", "vs.","vs.", "@","@")
TEAM2 <- c("DET", "CLE", "CHI", "ATL", "NOP", "GSW", "CHI","PHI", "ATL","BKN")
DATE <- as.Date(c("2015-05-14", "2015-05-14", "2015-05-14",
           "2015-05-14","2015-05-14", "2015-05-14", "2015-05-15","2015-05-15",
           "2015-05-15","2015-05-15"))
PTS <- c(94, 97, 95, 106, 111, 95, 100,112,87, 94)
df <- data.frame(TEAM1,HOME.AWAY,TEAM2,PTS,DATE)

df

   TEAM1 HOME.AWAY TEAM2 PTS       DATE
   ATL       vs.   DET  94 2015-05-14
   CHI       vs.   CLE  97 2015-05-14
   CLE         @   CHI  95 2015-05-14
   DET         @   ATL 106 2015-05-14
   GSW       vs.   NOP 111 2015-05-14
   NOP         @   GSW  95 2015-05-14
   BKN       vs.   CHI 100 2015-05-15
   ATL       vs.   PHI 112 2015-05-15
   PHI         @   ATL  87 2015-05-15
   CHI       vs.   BKN  94 2015-05-15

数据框按团队级别进行组织。所以每个游戏都会创建两行数据。例如,亚特兰大vs底特律(第一排)和底特律vs亚特兰大(第四排)。然后,数据帧包括TEAM1的分数(PTS,REB,AST ......)。对于这个例子,我只包括Points得分变量。我想创建一个新的变量,即#34;得分由对手队得分#34;。

输出看起来像这样:

   TEAM1 HOME.AWAY TEAM2 PTS       DATE  PTS.OPPT
   ATL       vs.   DET  94 2015-05-14    106
   CHI       vs.   CLE  97 2015-05-14    95
   CLE         @   CHI  95 2015-05-14    97
   DET         @   ATL 106 2015-05-14    94
   GSW       vs.   NOP 111 2015-05-14    95
   NOP         @   GSW  95 2015-05-14    111
   BKN       vs.   CHI 100 2015-05-15    94
   ATL       vs.   PHI 112 2015-05-15    87
   PHI         @   ATL  87 2015-05-15    112
   CHI       vs.   BKN  94 2015-05-15    100

我尝试按日期使用分组,然后通过无法找出匹配的部分进行某种匹配。

1 个答案:

答案 0 :(得分:2)

> TEAM1 <- c("ATL", "CHI", "CLE", "DET", "GSW", "NOP", "BKN","ATL", "PHI","CHI")
> HOME.AWAY <- c("vs.", "vs.", "@", "@", "vs.", "@", "vs.","vs.", "@","@")
> TEAM2 <- c("DET", "CLE", "CHI", "ATL", "NOP", "GSW", "CHI","PHI", "ATL","BKN")
> DATE <- as.Date(c("2015-05-14", "2015-05-14", "2015-05-14",
+                   "2015-05-14","2015-05-14", "2015-05-14", "2015-05-15","2015-05-15",
+                   "2015-05-15","2015-05-15"))
> PTS <- c(94, 97, 95, 106, 111, 95, 100,112,87, 94)
> df <- data.frame(TEAM1,HOME.AWAY,TEAM2,PTS,DATE)
> 
> df<-merge(df, df,  by.x=c("TEAM1", "TEAM2", "DATE"), by.y=c("TEAM2", "TEAM1", "DATE"))
> df<-df[,c("TEAM1", "HOME.AWAY.x", "TEAM2", "PTS.x","DATE", "PTS.y" )]
> names(df)<-c("TEAM1", "HOME.AWAY", "TEAM2","PTS", "DATE", "PTS.OPPT")
> df
   TEAM1 HOME.AWAY TEAM2 PTS       DATE PTS.OPPT
1    ATL       vs.   DET  94 2015-05-14      106
2    ATL       vs.   PHI 112 2015-05-15       87
3    BKN       vs.   CHI 100 2015-05-15       94
4    CHI         @   BKN  94 2015-05-15      100
5    CHI       vs.   CLE  97 2015-05-14       95
6    CLE         @   CHI  95 2015-05-14       97
7    DET         @   ATL 106 2015-05-14       94
8    GSW       vs.   NOP 111 2015-05-14       95
9    NOP         @   GSW  95 2015-05-14      111
10   PHI         @   ATL  87 2015-05-15      112