我的数据集的小代表:
TEAM1 <- c("ATL", "CHI", "CLE", "DET", "GSW", "NOP", "BKN","ATL", "PHI","CHI")
HOME.AWAY <- c("vs.", "vs.", "@", "@", "vs.", "@", "vs.","vs.", "@","@")
TEAM2 <- c("DET", "CLE", "CHI", "ATL", "NOP", "GSW", "CHI","PHI", "ATL","BKN")
DATE <- as.Date(c("2015-05-14", "2015-05-14", "2015-05-14",
"2015-05-14","2015-05-14", "2015-05-14", "2015-05-15","2015-05-15",
"2015-05-15","2015-05-15"))
PTS <- c(94, 97, 95, 106, 111, 95, 100,112,87, 94)
df <- data.frame(TEAM1,HOME.AWAY,TEAM2,PTS,DATE)
df
TEAM1 HOME.AWAY TEAM2 PTS DATE
ATL vs. DET 94 2015-05-14
CHI vs. CLE 97 2015-05-14
CLE @ CHI 95 2015-05-14
DET @ ATL 106 2015-05-14
GSW vs. NOP 111 2015-05-14
NOP @ GSW 95 2015-05-14
BKN vs. CHI 100 2015-05-15
ATL vs. PHI 112 2015-05-15
PHI @ ATL 87 2015-05-15
CHI vs. BKN 94 2015-05-15
数据框按团队级别进行组织。所以每个游戏都会创建两行数据。例如,亚特兰大vs底特律(第一排)和底特律vs亚特兰大(第四排)。然后,数据帧包括TEAM1的分数(PTS,REB,AST ......)。对于这个例子,我只包括Points得分变量。我想创建一个新的变量,即#34;得分由对手队得分#34;。
输出看起来像这样:
TEAM1 HOME.AWAY TEAM2 PTS DATE PTS.OPPT
ATL vs. DET 94 2015-05-14 106
CHI vs. CLE 97 2015-05-14 95
CLE @ CHI 95 2015-05-14 97
DET @ ATL 106 2015-05-14 94
GSW vs. NOP 111 2015-05-14 95
NOP @ GSW 95 2015-05-14 111
BKN vs. CHI 100 2015-05-15 94
ATL vs. PHI 112 2015-05-15 87
PHI @ ATL 87 2015-05-15 112
CHI vs. BKN 94 2015-05-15 100
我尝试按日期使用分组,然后通过无法找出匹配的部分进行某种匹配。
答案 0 :(得分:2)
> TEAM1 <- c("ATL", "CHI", "CLE", "DET", "GSW", "NOP", "BKN","ATL", "PHI","CHI")
> HOME.AWAY <- c("vs.", "vs.", "@", "@", "vs.", "@", "vs.","vs.", "@","@")
> TEAM2 <- c("DET", "CLE", "CHI", "ATL", "NOP", "GSW", "CHI","PHI", "ATL","BKN")
> DATE <- as.Date(c("2015-05-14", "2015-05-14", "2015-05-14",
+ "2015-05-14","2015-05-14", "2015-05-14", "2015-05-15","2015-05-15",
+ "2015-05-15","2015-05-15"))
> PTS <- c(94, 97, 95, 106, 111, 95, 100,112,87, 94)
> df <- data.frame(TEAM1,HOME.AWAY,TEAM2,PTS,DATE)
>
> df<-merge(df, df, by.x=c("TEAM1", "TEAM2", "DATE"), by.y=c("TEAM2", "TEAM1", "DATE"))
> df<-df[,c("TEAM1", "HOME.AWAY.x", "TEAM2", "PTS.x","DATE", "PTS.y" )]
> names(df)<-c("TEAM1", "HOME.AWAY", "TEAM2","PTS", "DATE", "PTS.OPPT")
> df
TEAM1 HOME.AWAY TEAM2 PTS DATE PTS.OPPT
1 ATL vs. DET 94 2015-05-14 106
2 ATL vs. PHI 112 2015-05-15 87
3 BKN vs. CHI 100 2015-05-15 94
4 CHI @ BKN 94 2015-05-15 100
5 CHI vs. CLE 97 2015-05-14 95
6 CLE @ CHI 95 2015-05-14 97
7 DET @ ATL 106 2015-05-14 94
8 GSW vs. NOP 111 2015-05-14 95
9 NOP @ GSW 95 2015-05-14 111
10 PHI @ ATL 87 2015-05-15 112