将列从一个数据帧合并到另一个数据帧(left_join不起作用) - rstudio

时间:2017-12-20 12:26:10

标签: r merge left-join

我有两个数据帧:

df1 :我的主数据集,地址栏

df2 :包含纬度和经度以及地址列的数据库

我想将两列从df2合并到我的df1。

DF1

ID    VAR1   VAR2   VARX      Address
 1     7      2       x     Road 1, 1234 City
 2     8      0       y     Road 4, 1234 City
 3     6      2       x     Road 5, 1234 City
 4     7      2       x     Road 6, 1234 City
 5     4      1       y     Road 10, 1234 City
 6     1      2       x     Road 11, 1234 City

DF2

    Address            Latitude      Longitude
Road 1, 1234 City        12,67          56,78
Road 2, 1234 City        12,66          55,67
Road 3, 1234 City        12,45          55,10
Road 4, 1234 City        12,10          55,20
Road 5, 1234 City        11,50          55,30
Road 6, 1234 City        12,34          55,32
Road 7, 1234 City        12,89          55,40
Road 8, 1234 City        12,77          55,45
Road 9, 1234 City        11,67          55,67
Road 10, 1234 City       11,90          55,78
Road 11, 1234 City       11,12          56,59

所以我的新数据框看起来像这样:

新数据框,df3

ID    VAR1   VAR2   VARX      Address            Latitude   Longitude
 1     7      2       x     Road 1, 1234 City     12,67       56,78
 2     8      0       y     Road 4, 1234 City     12,10       55,20
 3     6      2       x     Road 5, 1234 City     11,50       55,30
 4     7      2       x     Road 6, 1234 City     12,34       55,32
 5     4      1       y     Road 10, 1234 City    11,90       55,78
 6     1      2       x     Road 11, 1234 City    11,12       56,59

我尝试了 left_join ,但它只返回NA。

df3 <- left_join(df1, df2, by = c("Address"))

编辑:已解决 显然,我的一个地址列中有一些错误的空格。上面的代码确实有效。

2 个答案:

答案 0 :(得分:1)

left_join应该可以正常工作。 看看这个并检查您的数据结构。

df3 <- dplyr::left_join(df1, df2, by = "Address")

<强>输出

  ID VAR1 VAR2 VARX            Address Latitude Longitude
1  1    7    2    x  Road 1, 1234 City    12,67     56,78
2  2    8    0    y  Road 4, 1234 City    12,10     55,20
3  3    6    2    x  Road 5, 1234 City    11,50     55,30
4  4    7    2    x  Road 6, 1234 City    12,34     55,32
5  5    4    1    y Road 10, 1234 City    11,90     55,78
6  6    1    2    x Road 11, 1234 City    11,12     56,59

数据

DF1

structure(list(ID = 1:6, VAR1 = c(7L, 8L, 6L, 7L, 4L, 1L), VAR2 = c(2L, 
0L, 2L, 2L, 1L, 2L), VARX = structure(c(1L, 2L, 1L, 1L, 2L, 1L
), .Label = c("x", "y"), class = "factor"), Address = structure(c(1L, 
4L, 5L, 6L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City", 
"Road 11, 1234 City", "Road 4, 1234 City", "Road 5, 1234 City", 
"Road 6, 1234 City"), class = "factor")), .Names = c("ID", "VAR1", 
"VAR2", "VARX", "Address"), class = "data.frame", row.names = c(NA, 
-6L))

DF2

structure(list(Address = structure(c(1L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City", 
"Road 11, 1234 City", "Road 2, 1234 City", "Road 3, 1234 City", 
"Road 4, 1234 City", "Road 5, 1234 City", "Road 6, 1234 City", 
"Road 7, 1234 City", "Road 8, 1234 City", "Road 9, 1234 City"
), class = "factor"), Latitude = structure(c(9L, 8L, 7L, 5L, 
2L, 6L, 11L, 10L, 3L, 4L, 1L), .Label = c("11,12", "11,50", "11,67", 
"11,90", "12,10", "12,34", "12,45", "12,66", "12,67", "12,77", 
"12,89"), class = "factor"), Longitude = structure(c(10L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("55,10", "55,20", 
"55,30", "55,32", "55,40", "55,45", "55,67", "55,78", "56,59", 
"56,78"), class = "factor")), .Names = c("Address", "Latitude", 
"Longitude"), class = "data.frame", row.names = c(NA, -11L))

答案 1 :(得分:1)

基础R功能

merge(df1,df2,by = "Address")

输出

      Address        ID  VAR1 VAR2 VARX Latitude Longitude
1  Road 1, 1234 City  1    7    2    x    12,67     56,78
2 Road 10, 1234 City  5    4    1    y    11,90     55,78
3 Road 11, 1234 City  6    1    2    x    11,12     56,59
4  Road 4, 1234 City  2    8    0    y    12,10     55,20
5  Road 5, 1234 City  3    6    2    x    11,50     55,30
6  Road 6, 1234 City  4    7    2    x    12,34     55,32