通过R中的两个匹配列连接两个数据集

时间:2020-04-01 05:33:45

标签: r join dplyr tidyverse

客观

我有两个数据集:df1和df2

df1


Date           Name      Duration

1/2/2020       Tanisha   50
1/3/2020       Lisa      10
1/5/2020       Lisa      10



df2


Date           Name      Duration

1/2/2020       Tanisha   80
1/3/2020       Lisa      50
1/5/2020       Tom       10

所需的输出:

  Date           Name      Duration        Date           Name       Duration

  1/2/2020       Tanisha   50              1/2/2020       Tanisha     80  
  1/3/2020       Lisa      10              1/3/2020       Lisa        50

我希望将名称列中的内容与df1和df2以及日期列进行匹配

df1和df2的Dput:

 structure(list(Date = structure(1:3, .Label = c("1/2/2020", "1/3/2020", 
 "1/5/2020"), class = "factor"), Name = structure(c(2L, 1L, 1L
  ), .Label = c("Lisa", "Tanisha"), class = "factor"), Duration = c(50L, 
 10L, 10L), X = c(NA, NA, NA), X.1 = c(NA, NA, NA), X.2 = c(NA, 
 NA, NA), X.3 = c(NA, NA, NA)), class = "data.frame", row.names = c(NA, 
 -3L))



structure(list(Date = structure(1:3, .Label = c("1/2/2020", "1/3/2020", 
"1/5/2020"), class = "factor"), Name = structure(c(2L, 1L, 3L
), .Label = c("lisa", "tanisha", "tom"), class = "factor"), Duration2 = c(80L, 
50L, 10L)), class = "data.frame", row.names = c(NA, -3L))

我尝试过的事情:

水平合并

 merge(df1, df2, all.x=True)

我不确定如何匹配名称和日期内容

感谢您的帮助。

1 个答案:

答案 0 :(得分:2)

这是一个简单的合并,但是您的Name列不一致。将它们转换为相似的格式(大写,小写或标题大小写),然后合并。同样,DateName不需要重复的列,因为它们具有完全相同的信息。

library(dplyr)
df1 %>% mutate(Name = tolower(Name)) %>% inner_join(df2, by = c('Date', 'Name'))

或在基数R中:

merge(transform(df1, Name = tolower(Name)), df2, by = c('Date', 'Name'))


#      Date    Name Duration Duration2
#1 1/2/2020 tanisha       50        80
#2 1/3/2020    lisa       10        50