我正在尝试合并两个数据集。在过去,我使用merge()
by
等于我要合并的变量。但是,现在我想用两个变量来做。我的第一个数据集看起来像这样:
Year Winning_Tm Losing_Tm
2011 Texas Washington
2012 Alabama South Carolina
2013 Tennessee Texas
然后我有另一个数据集,每个团队的每个团队的排名(这是非常简化的)。像这样:
Year Team Rank
2011 Texas 32
2011 Washington 34
2012 South Carolina 45
2012 Alabama 12
2013 Texas 6
2013 Tennessee 51
我想合并它们,所以我有一个如下所示的数据集:
Year Winning_Tm Winning_TM_rank Losing_Tm Losing_Tm_rank
2011 Texas 32 Washington 34
2012 Alabama 12 South Carolina 45
2013 Tennessee 51 Texas 6
我希望有一种简单的方法可以做到这一点,但可能会更复杂。谢谢!
答案 0 :(得分:4)
我复制了您的数据(下次尝试添加dput
):
A <- data.frame(
Year = c(2011, 2012, 2013),
Winning_Tm = c("Texas","Alabama","Tennessee"),
Losing_Tm = c("Washington","South Carolina", "Texas"),
stringsAsFactors = FALSE
)
B <- data.frame(
Year = c("2011","2011","2012","2012","2013","2013"),
Team = c("Texas","Washington","South Carolina","Alabama","Texas","Tennessee"),
Rank = c(32,34,45,12,6,51),
stringsAsFactors = FALSE
)
您可以使用melt
包reshape2
library(reshape2)
A <- melt(A, id.vars = "Year")
names(A)[3] <- "Team"
第一个数据框:
> A
Year variable Team
1 2011 Winning_Tm Texas
2 2012 Winning_Tm Alabama
3 2013 Winning_Tm Tennessee
4 2011 Losing_Tm Washington
5 2012 Losing_Tm South Carolina
6 2013 Losing_Tm Texas
现在看起来像这样:
AB <- merge(A, B, by=c("Year","Team"))
然后,您可以通过感兴趣的两列将数据集合并在一起:
> AB
Year Team variable Rank
1 2011 Texas Winning_Tm 32
2 2011 Washington Losing_Tm 34
3 2012 Alabama Winning_Tm 12
4 2012 South Carolina Losing_Tm 45
5 2013 Tennessee Winning_Tm 51
6 2013 Texas Losing_Tm 6
看起来像这样:
reshape
然后使用基础R中的AB
命令,您可以将reshape(AB, idvar = "Year", timevar = "variable", direction = "wide")
更改为宽格式:
Year Team.Winning_Tm Rank.Winning_Tm Team.Losing_Tm Rank.Losing_Tm
1 2011 Texas 32 Washington 34
3 2012 Alabama 12 South Carolina 45
5 2013 Tennessee 51 Texas 6
结果:
ga:pagePath
答案 1 :(得分:2)
如果您熟悉SQL
这是一个相当复杂但快速的方法,一步到位就是:
res <- sqldf("SELECT l.*,
max(case when l.Winning_Tm = r.Team then r.Rank else 0 end) as Winning_Tm_rank,
max(case when l.Losing_Tm = r.Team then r.Rank else 0 end) as Losing_Tm_rank
FROM df1 as l
inner join df2 as r
on (l.Winning_Tm = r.Team
OR l.Losing_Tm = r.Team)
AND l.Year = r.Year
group by l.Year, l.Winning_Tm, l.Losing_Tm")
res
Year Winning_Tm Losing_Tm Winning_Tm_rank Losing_Tm_rank
1 2011 Texas Washington 32 34
2 2012 Alabama South_Carolina 12 45
3 2013 Tennessee Texas 51 6
数据:强>
df1 <- read.table(header=T,text="Year Winning_Tm Losing_Tm
2011 Texas Washington
2012 Alabama South_Carolina
2013 Tennessee Texas")
df2<- read.table(header=T,text="Year Team Rank
2011 Texas 32
2011 Washington 34
2012 South_Carolina 45
2012 Alabama 12
2013 Texas 6
2013 Tennessee 51")
答案 2 :(得分:2)
两个单独的合并。您需要在by
中包含c()
变量列表,由于变量名称不同,因此您需要by.x
和by.y
。之后你可以重命名等级变量。
我会分别拨打您的数据winlose
和teamrank
。然后你需要:
first_merge <- merge(winlose, teamrank, by.x = c('Year', 'Winning_Tm'), by.y = c('Year', 'Team'))
second_merge <- merge(first_merge, teamrank, by.x = c('Year', 'Losing_Tm'), by.y = c('Year', 'Team'))
重命名变量:
names(second_merge)[names(second_merge) == 'Rank.x'] <- 'Winning_Tm_rank'
names(second_merge)[names(second_merge) == 'Rank.y'] <- 'Losing_Tm_rank'
答案 3 :(得分:0)
让X1
包含您的第一个表格,X2
包含您的第二个表格。
library( dplyr )
library( plyr )
## Create a joint table to work with
XX <- inner_join( X1, X2, by="Year" )
## Compute the ranks
f <- function( x, y, r ) { r[ as.character(x) == as.character(y) ] }
rr <- ddply( XX, "Year", summarise,
Winning_TM_Rank = f(Team, Winning_Tm, Rank ),
Losing_TM_Rank = f(Team, Losing_Tm, Rank) )
## Combine the results and reorder the columns
inner_join( X1, rr )[,c(1,2,4,3,5)]