在R中合并2个变量的数据

时间:2016-08-25 17:29:37

标签: r merge

我正在尝试合并两个数据集。在过去,我使用merge() by等于我要合并的变量。但是,现在我想用两个变量来做。我的第一个数据集看起来像这样:

Year   Winning_Tm    Losing_Tm
2011   Texas         Washington
2012   Alabama       South Carolina
2013   Tennessee     Texas

然后我有另一个数据集,每个团队的每个团队的排名(这是非常简化的)。像这样:

Year    Team             Rank
2011    Texas            32
2011    Washington       34
2012    South Carolina   45
2012    Alabama          12
2013    Texas            6
2013    Tennessee        51

我想合并它们,所以我有一个如下所示的数据集:

Year   Winning_Tm    Winning_TM_rank    Losing_Tm        Losing_Tm_rank
2011   Texas         32                 Washington       34
2012   Alabama       12                 South Carolina   45
2013   Tennessee     51                 Texas            6

我希望有一种简单的方法可以做到这一点,但可能会更复杂。谢谢!

4 个答案:

答案 0 :(得分:4)

我复制了您的数据(下次尝试添加dput):

A <- data.frame(
  Year = c(2011, 2012, 2013),
  Winning_Tm = c("Texas","Alabama","Tennessee"),
  Losing_Tm = c("Washington","South Carolina", "Texas"),
  stringsAsFactors = FALSE
)

B <- data.frame(
  Year = c("2011","2011","2012","2012","2013","2013"),
  Team = c("Texas","Washington","South Carolina","Alabama","Texas","Tennessee"),
  Rank = c(32,34,45,12,6,51),
  stringsAsFactors = FALSE
)

您可以使用meltreshape2 library(reshape2) A <- melt(A, id.vars = "Year") names(A)[3] <- "Team" 第一个数据框:

> A
  Year   variable           Team
1 2011 Winning_Tm          Texas
2 2012 Winning_Tm        Alabama
3 2013 Winning_Tm      Tennessee
4 2011  Losing_Tm     Washington
5 2012  Losing_Tm South Carolina
6 2013  Losing_Tm          Texas

现在看起来像这样:

AB <- merge(A, B, by=c("Year","Team"))

然后,您可以通过感兴趣的两列将数据集合并在一起:

> AB
  Year           Team   variable Rank
1 2011          Texas Winning_Tm   32
2 2011     Washington  Losing_Tm   34
3 2012        Alabama Winning_Tm   12
4 2012 South Carolina  Losing_Tm   45
5 2013      Tennessee Winning_Tm   51
6 2013          Texas  Losing_Tm    6

看起来像这样:

reshape

然后使用基础R中的AB命令,您可以将reshape(AB, idvar = "Year", timevar = "variable", direction = "wide") 更改为宽格式:

  Year Team.Winning_Tm Rank.Winning_Tm Team.Losing_Tm Rank.Losing_Tm
1 2011           Texas              32     Washington             34
3 2012         Alabama              12 South Carolina             45
5 2013       Tennessee              51          Texas              6

结果:

ga:pagePath

答案 1 :(得分:2)

如果您熟悉SQL这是一个相当复杂但快速的方法,一步到位就是:

res <- sqldf("SELECT l.*,
                     max(case when l.Winning_Tm = r.Team then r.Rank else 0 end) as Winning_Tm_rank,
                     max(case when l.Losing_Tm = r.Team then r.Rank else 0 end) as Losing_Tm_rank
             FROM      df1 as l
             inner join df2 as r
             on        (l.Winning_Tm = r.Team
             OR        l.Losing_Tm = r.Team)
             AND       l.Year = r.Year
             group by  l.Year, l.Winning_Tm, l.Losing_Tm")

res
  Year Winning_Tm      Losing_Tm Winning_Tm_rank Losing_Tm_rank
1 2011      Texas     Washington              32             34
2 2012    Alabama South_Carolina              12             45
3 2013  Tennessee          Texas              51              6

数据:

df1 <- read.table(header=T,text="Year   Winning_Tm    Losing_Tm
2011   Texas         Washington
2012   Alabama       South_Carolina
2013   Tennessee     Texas")

df2<- read.table(header=T,text="Year Team Rank
2011    Texas            32
2011    Washington       34
2012    South_Carolina   45
2012    Alabama          12
2013    Texas            6
2013    Tennessee        51")

答案 2 :(得分:2)

两个单独的合并。您需要在by中包含c()变量列表,由于变量名称不同,因此您需要by.xby.y。之后你可以重命名等级变量。

我会分别拨打您的数据winloseteamrank。然后你需要:

first_merge <- merge(winlose, teamrank, by.x = c('Year', 'Winning_Tm'), by.y = c('Year', 'Team'))
second_merge <- merge(first_merge, teamrank, by.x = c('Year', 'Losing_Tm'), by.y = c('Year', 'Team'))

重命名变量:

names(second_merge)[names(second_merge) == 'Rank.x'] <- 'Winning_Tm_rank'
names(second_merge)[names(second_merge) == 'Rank.y'] <- 'Losing_Tm_rank'

答案 3 :(得分:0)

X1包含您的第一个表格,X2包含您的第二个表格。

library( dplyr )
library( plyr )

## Create a joint table to work with
XX <- inner_join( X1, X2, by="Year" )

## Compute the ranks
f <- function( x, y, r ) { r[ as.character(x) == as.character(y) ] }
rr <- ddply( XX, "Year", summarise,
  Winning_TM_Rank = f(Team, Winning_Tm, Rank ),
  Losing_TM_Rank = f(Team, Losing_Tm, Rank) )

## Combine the results and reorder the columns
inner_join( X1, rr )[,c(1,2,4,3,5)]