比较两个数据帧中的值;根据值在数据帧之间移动值

时间:2018-01-24 14:14:07

标签: r merge

我一直无法找到答案。 stackoverflow上可能有一个...但我还没找到一个我可以使用的。

我有两个数据框(db.1和db.larger)。我需要做的是:

if db.1$ID == db.larger$ID
db1$Gender <- db.larger$Gender

如果ID匹配,我需要将Gender值从db.larger复制到db.1。

  • 两个数据框都介于500,000行和600万行之间。
  • db.1包含重复项,因为此示例中未显示的更多列包含我必须保留的唯一且重要的信息。
  • 两个数据框包含的列数多于显示的
  • ID值是字符,因为它们可以包含前导零。

我无法使用匹配,因为db.1中有多个人出现

Merge对我没有用,因为它向数据框添加了比我想要的更多的数据(列)。

以下是示例输出文件:

db.1 <- structure(list(ID = c("453", "286", "345", "853", "675", "754","445", "564", "651", "685", "453", "286", "345"), Gender = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Name = c("Rashad Lawrence", "Ali Santana", "Cordell Cobb", "Amani Bennett", "Donavan Frank", "Jeffrey Michael", "Aliana Trujillo", "Cheyanne Wyatt", "Kayden Padilla", "Jasmine Glass", "Rashad Lawrence", "Ali Santana", "Cordell Cobb"), Score = c(0, 0.044, 0.822, 0.322, 0.394, 0.309, 0.826, 0.729, 0.318, 0.6, 0.648, 0.547, 0.53)), .Names = c("ID", "Gender","Name", "Score"), row.names = c(NA, -13L), class = "data.frame")

db.larger <- structure(list(ID = c("123", "158", "286", "345", "445", "453", "469", "546", "564", "566", "651", "675", "682", "685", "741", "754", "789", "852", "853", "963"), Gender = c(1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1), Name = c("Dexter Holmes", "Roman Macias", "Ali Santana", "Cordell Cobb", "Aliana Trujillo", "Rashad Lawrence", "Preston Mckee", "Kyra Howe", "Cheyanne Wyatt", "Tobias Hart", "Kayden Padilla", "Donavan Frank", "Jamie Yoder", "Jasmine Glass", "Jamar Carter", "Jeffrey Michael", "Erick Tate", "Darion Graves", "Amani Bennett", "Regina Sanders")), .Names = c("ID", "Gender", "Name"), row.names = c(NA, 20L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

由于您在db.1$Gender中始终缺少值,因此您可以删除此列,然后从inner_join执行dplyr。此过程将重复项保留在db.1

library(dplyr)

db.1 <- db.1 %>%
select(-Gender)

db.combine <- inner_join(db.1,db.larger, by = "ID")

db.combine
    ID          Name.x Gender          Name.y
1  453 Rashad Lawrence      1 Rashad Lawrence
2  286     Ali Santana      2     Ali Santana
3  345    Cordell Cobb      1    Cordell Cobb
4  853   Amani Bennett      1   Amani Bennett
5  675   Donavan Frank      2   Donavan Frank
6  754 Jeffrey Michael      2 Jeffrey Michael
7  445 Aliana Trujillo      1 Aliana Trujillo
8  564  Cheyanne Wyatt      2  Cheyanne Wyatt
9  651  Kayden Padilla      2  Kayden Padilla
10 685   Jasmine Glass      2   Jasmine Glass
11 453 Rashad Lawrence      1 Rashad Lawrence
12 286     Ali Santana      2     Ali Santana
13 345    Cordell Cobb      1    Cordell Cobb

您的Name变量显然不是完美匹配,但您只需使用Name.x删除Name.yselect