我有两个数据帧:
df1 :我的主数据集,地址栏
df2 :包含纬度和经度以及地址列的数据库
我想将两列从df2合并到我的df1。
DF1 :
ID VAR1 VAR2 VARX Address
1 7 2 x Road 1, 1234 City
2 8 0 y Road 4, 1234 City
3 6 2 x Road 5, 1234 City
4 7 2 x Road 6, 1234 City
5 4 1 y Road 10, 1234 City
6 1 2 x Road 11, 1234 City
DF2 :
Address Latitude Longitude
Road 1, 1234 City 12,67 56,78
Road 2, 1234 City 12,66 55,67
Road 3, 1234 City 12,45 55,10
Road 4, 1234 City 12,10 55,20
Road 5, 1234 City 11,50 55,30
Road 6, 1234 City 12,34 55,32
Road 7, 1234 City 12,89 55,40
Road 8, 1234 City 12,77 55,45
Road 9, 1234 City 11,67 55,67
Road 10, 1234 City 11,90 55,78
Road 11, 1234 City 11,12 56,59
所以我的新数据框看起来像这样:
新数据框,df3 :
ID VAR1 VAR2 VARX Address Latitude Longitude
1 7 2 x Road 1, 1234 City 12,67 56,78
2 8 0 y Road 4, 1234 City 12,10 55,20
3 6 2 x Road 5, 1234 City 11,50 55,30
4 7 2 x Road 6, 1234 City 12,34 55,32
5 4 1 y Road 10, 1234 City 11,90 55,78
6 1 2 x Road 11, 1234 City 11,12 56,59
我尝试了 left_join ,但它只返回NA。
df3 <- left_join(df1, df2, by = c("Address"))
编辑:已解决 显然,我的一个地址列中有一些错误的空格。上面的代码确实有效。
答案 0 :(得分:1)
left_join
应该可以正常工作。
看看这个并检查您的数据结构。
df3 <- dplyr::left_join(df1, df2, by = "Address")
<强>输出强>
ID VAR1 VAR2 VARX Address Latitude Longitude
1 1 7 2 x Road 1, 1234 City 12,67 56,78
2 2 8 0 y Road 4, 1234 City 12,10 55,20
3 3 6 2 x Road 5, 1234 City 11,50 55,30
4 4 7 2 x Road 6, 1234 City 12,34 55,32
5 5 4 1 y Road 10, 1234 City 11,90 55,78
6 6 1 2 x Road 11, 1234 City 11,12 56,59
数据强>
DF1
structure(list(ID = 1:6, VAR1 = c(7L, 8L, 6L, 7L, 4L, 1L), VAR2 = c(2L,
0L, 2L, 2L, 1L, 2L), VARX = structure(c(1L, 2L, 1L, 1L, 2L, 1L
), .Label = c("x", "y"), class = "factor"), Address = structure(c(1L,
4L, 5L, 6L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City",
"Road 11, 1234 City", "Road 4, 1234 City", "Road 5, 1234 City",
"Road 6, 1234 City"), class = "factor")), .Names = c("ID", "VAR1",
"VAR2", "VARX", "Address"), class = "data.frame", row.names = c(NA,
-6L))
DF2
structure(list(Address = structure(c(1L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City",
"Road 11, 1234 City", "Road 2, 1234 City", "Road 3, 1234 City",
"Road 4, 1234 City", "Road 5, 1234 City", "Road 6, 1234 City",
"Road 7, 1234 City", "Road 8, 1234 City", "Road 9, 1234 City"
), class = "factor"), Latitude = structure(c(9L, 8L, 7L, 5L,
2L, 6L, 11L, 10L, 3L, 4L, 1L), .Label = c("11,12", "11,50", "11,67",
"11,90", "12,10", "12,34", "12,45", "12,66", "12,67", "12,77",
"12,89"), class = "factor"), Longitude = structure(c(10L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("55,10", "55,20",
"55,30", "55,32", "55,40", "55,45", "55,67", "55,78", "56,59",
"56,78"), class = "factor")), .Names = c("Address", "Latitude",
"Longitude"), class = "data.frame", row.names = c(NA, -11L))
答案 1 :(得分:1)
基础R功能
merge(df1,df2,by = "Address")
输出
Address ID VAR1 VAR2 VARX Latitude Longitude
1 Road 1, 1234 City 1 7 2 x 12,67 56,78
2 Road 10, 1234 City 5 4 1 y 11,90 55,78
3 Road 11, 1234 City 6 1 2 x 11,12 56,59
4 Road 4, 1234 City 2 8 0 y 12,10 55,20
5 Road 5, 1234 City 3 6 2 x 11,50 55,30
6 Road 6, 1234 City 4 7 2 x 12,34 55,32