我有3个不等行的数据帧
df1-
T1 T2 T3
1 Joe TTT
2 PP YYY
3 JJ QQQ
5 UU OOO
6 OO GGG
df2
X1 X2
1 09/20/2017
2 08/02/2015
3 05/02/2000
8 06/03/1999
df3
L1 L2
1 New
6 Notsure
9 Also
最终的数据框应该像所有3个仅保留df1行的左连接一样。匹配的行是T1,X1和L1,但具有不同的标题名称。每个数据帧中的行数不同。我无法找到适合这种情况的解决方案。在SO上,我发现可用于2个数据帧或3个具有相同行或相同列名的数据帧
T1 T2 T3 X2 L2
1 Joe TTT 09/20/2017 New
2 PP YYY 08/02/2015 NA
3 JJ QQQ 05/02/2000 NA
5 UU OOO NA NA
6 OO GGG NA NotSure
我在R中比较新,而且找不到这个
的R代码答案 0 :(得分:4)
我们的想法是将您的数据框放在一个列表中,更改第一列的名称,并使用Reduce
进行合并,即
Reduce(function(...) merge(..., by = 'Var1', all.x = TRUE),
lapply( mget(ls(pattern = 'df[0-9]+')), function(i) {names(i)[1] <- 'Var1'; i}))
给出,
Var1 T2 T3 X2 L2 1 1 Joe TTT 09/20/2017 New 2 2 PP YYY 08/02/2015 Old 3 3 JJ QQQ 05/02/2000 <NA> 4 5 UU OOO <NA> <NA> 5 6 OO GGG <NA> Notsure
答案 1 :(得分:2)
使用tidyverse
函数,您可以尝试:
df1 %>%
left_join(df2, by = c("T1" = "X1")) %>%
left_join(df3, by = c("T1" = "L1"))
给出:
T1 T2 T3 X2 L2
1 1 Joe TTT 09/20/2017 New
2 2 PP YYY 08/02/2015 <NA>
3 3 JJ QQQ 05/02/2000 <NA>
4 5 UU OOO <NA> <NA>
5 6 OO GGG <NA> Notsure
答案 2 :(得分:1)
1)sqldf
library(sqldf)
sqldf("select df1.*, X2, L2
from df1
left join df2 on T1 = X1
left join df3 on T1 = L1")
1a)虽然稍长一点,但这种变化可以让以后在查看代码时更容易,因为它明确了每列的来源。如果数据框名称很长,您可能想要使用别名,例如from df1 as a
,但在这里我们不打扰,因为它们很短。
sqldf("select df1.*, df2.X2, df3.L2
from df1
left join df2 on df1.T1 = df2.X1
left join df3 on df1.T1 = df3.L1")
2)合并使用重复合并。没有包使用。
Merge <- function(x, y) merge(x, y, by = 1, all.x = TRUE)
Merge(Merge(df1, df2), df3)
2a)这也可以使用像这样的magrittr管道编写:
library(magrittr)
df1 %>% Merge(df2) %>% Merge(df3)
2b)使用Reduce
我们可以像这样重复合并:
Reduce(Merge, list(df1, df2, df3))
注意:可重复形式的输入为:
Lines1 <- "
T1 T2 T3
1 Joe TTT
2 PP YYY
3 JJ QQQ
5 UU OOO
6 OO GGG"
Lines2 <- "
X1 X2
1 09/20/2017
2 08/02/2015
3 05/02/2000
8 06/03/1999"
Lines3 <- "
L1 L2
1 New
6 Notsure
9 Also"
df1 <- read.table(text = Lines1, header = TRUE)
df2 <- read.table(text = Lines2, header = TRUE)
df3 <- read.table(text = Lines3, header = TRUE)
答案 3 :(得分:0)
使用left_join()
就像这样
df1 = data.frame(X = c("a", "b", "c"), var1 = c(1,2, 3))
df2 = data.frame(V = c("a", "b", "c"), var2 =c(5,NA, NA) )
df3 = data.frame(Y = c("a", "b", "c"), var3 =c("name", NA, "age") )
# rename
df2 = df2 %>% rename(X = V)
df3 = df3 %>% rename(X = Y)
df = left_join(df1, df2, by = "X") %>%
left_join(., df3, by = "X")
> df
X var1 var2 var3
1 a 1 5 name
2 b 2 NA <NA>
3 c 3 NA age