合并R

时间:2017-10-10 12:41:44

标签: r join merge duplicates

我有两个数据帧

db1 like:

date.prix;var1;var2
2012-10-02;pluf;plof
2012-12-11;pam;pim
2013-05-17;plop;plip
...

db2 like:

date.de.cotation;var3;var4
2012-10-02;tutu;toto
2012-10-02;ting;tong
2013-05-17;gui;guou
...

联接是date.prix = date.de.cotation

我想要的是:

date.prix;var1;var2;var3;var4
2012-10-02;pluf;plof;tutu;toto
2012-12-11;pam;pim;NA;NA
2013-05-17;plop;plip;gui;guou

所以:

  • 如果db2中有重复项,我想要第一个
  • 的值
  • 如果db2中的日期没有值,我想要NAs

2 个答案:

答案 0 :(得分:2)

我们可以使用duplicatedmerge函数:

db2_2 <- db2[!duplicated(db2$date.de.cotation), ] # remove everything but first instance
merge(db1, db2_2, by.x = 'date.prix', by.y = 'date.de.cotation', all.x = TRUE)

#    date.prix var1 var2 var3 var4
# 1 2012-10-02 pluf plof tutu toto
# 2 2012-12-11  pam  pim <NA> <NA>
# 3 2013-05-17 plop plip  gui guou

答案 1 :(得分:2)

data.table中的左连接有一个mult参数:mult='first'只保留db2中的第一个匹配行。

library(data.table)

db1 <- fread('date.prix;var1;var2
2012-10-02;pluf;plof
2012-12-11;pam;pim
2013-05-17;plop;plip')

db2 <- fread('date.de.cotation;var3;var4
2012-10-02;tutu;toto
2012-10-02;ting;tong
2013-05-17;gui;guou')

# if db1 and db2 are not data.table, do: setDT(db1); setDT(db2);

db2[db1, on = .(date.de.cotation = date.prix), mult = 'first']
#    date.de.cotation var3 var4 var1 var2
# 1:       2012-10-02 tutu toto pluf plof
# 2:       2012-12-11   NA   NA  pam  pim
# 3:       2013-05-17  gui guou plop plip