dplyr: How to select join columns by name?

时间:2017-08-04 12:37:09

标签: r dplyr

I would like to use dplyr's left_join to tranfer values ("new") from one DF to another.

How can I do that if I do not know the name of the key, but only know that it is the first variable in the dataset?

require("dplyr")

testData1 <- data.frame(idvar=c(1,2,3),
                    b=c("a","b","c"),
                    c=c("i","ii","iii"))

testData2 <- data.frame(identification=c(1,2),
                    b=c("a","b"),
                    c=c("i","NA"),
                    new=c("var1","var2"))

# now do a left join to obtain values of the new variable in the old dataset


(testResult1 <- left_join(testData1,testData2))
# var2 is not in the results because of the "NA" in testData2!


(testResult2 <- left_join(testData1,testData2,
                         by=c("idvar"="identification"))) 
# works as expected! ... but we do not know the name of the idvar!


(testResult3 <- left_join(testData1,testData2,
                         by=c(names(testData1)[1]=names(testData2)[1]))) 
# Error: unexpected '=' in:
#   "testResult3 <- left_join(testData1,testData2,
#                             by=c(names(testData1)[1]="

2 个答案:

答案 0 :(得分:3)

An alternative is to make the two key columns have the same name:

left_join(
    testData1,
    rename_at(testData2, 1, ~ names(testData1)[1]),
    by = names(testData1)[1]
)

#   idvar b.x c.x  b.y  c.y  new
# 1     1   a   i    a    i var1
# 2     2   b  ii    b   NA var2
# 3     3   c iii <NA> <NA> <NA>

# > (testResult2 <- left_join(testData1,testData2, by=c("idvar"="identification")))
#   idvar b.x c.x  b.y  c.y  new
# 1     1   a   i    a    i var1
# 2     2   b  ii    b   NA var2
# 3     3   c iii <NA> <NA> <NA>

答案 1 :(得分:2)

You could create the named vector in advance and then join as follows:

join_by = colnames(testData2)[1]
names(join_by)=colnames(testData1)[1]
left_join(testData1,testData2, by=join_by)

or in one line:

left_join(testData1,testData2, 
        by=structure(colnames(testData2)[1], names=colnames(testData1)[1]))

or alternatively, as suggested by Artem:

left_join(testData1,testData2, 
               by=setNames(colnames(testData2)[1], colnames(testData1)[1]))

Hope this helps!