Question

我正在寻找一种通过ID组合两个不同维度的表格的方法。但是最终表应该有一些不同的值，具体取决于每个表。

这是一个随机的例子：

IDx = c("a", "b", "c", "d")
sex = c("M", "F", "M", "F")

IDy = c("a", "a", "b", "c", "d", "d")
status = c("single", "children", "single", "children", "single", "children")
salary = c(30, 80, 50, 40, 30, 80)

x = data.frame(IDx, sex)
y = data.frame(IDy, status, salary)

这是x：

 IDx sex
1   a   M
2   b   F
3   c   M
4   d   F

这是y：

 IDy   status salary
1   a   single     30
2   a children     80
3   b   single     50
4   c children     40
5   d   single     30
6   d children     80

我正在寻找：

 IDy sex   status salary
1   a   M   single     30
2   a   M children     80
3   b   F   single     50
4   c   M children     40
5   d   F   single     30
6   d   F children     80

基本上，性别应该匹配以满足表y的需要。 应使用两个表中的所有值，实际表格要大得多。并非所有ID都需要复制。

这应该相当简单，但我无法在网上找到一个好的答案。 注意，我不希望引入NAs。 我是R的新手，因为我专注于dplyr，如果这个例子来自那里会有所帮助。基础R也可能很简单。

更新

上面的粗体句子可能会让最终答案感到困惑。对不起，这是一个令人困惑的案例，我意识到应该包括一个额外的列，使事情复杂化，但后来更多。

首先，我试着看看我的执行表上发生了什么，并找出符合我需要的建议答案。我删除了任何有问题的列以获得以下结果。所以，我查了一下：

dim(x) > [1] 231 2 dim(y) > [1] 199 8 # left_join joins matching rows from y to x suchait <- left_join(x, y, by= c("IDx" = "IDy")) # inner_join retains only rows in both sets jdobres <- inner_join(y, anno2, by = c(IDx = "IDy")) dim(suchait) # actuall table used > [1] 225 9 dim(jdobres) > [1] 219 9

但为什么/它们在哪里看起来不一样？这显示了在suchait表中引入的6行但不在jdobres中引入，这是因为不同的方法。

setdiff(suchait, jdobres )

Answer 1

使用dplyr：

library(dplyr)  
df <- left_join(x, y, by = c("IDx" = "IDy"))

您的结果将是：

   IDx sex   status salary
1   a   M   single     30
2   a   M children     80
3   b   F   single     50
4   c   M children     40
5   d   F   single     30
6   d   F children     80

或者你可以这样做：

df <- left_join(y, x, by = c("IDy" = "IDx"))

它会给：

   IDy   status salary sex
1   a   single     30   M
2   a children     80   M
3   b   single     50   F
4   c children     40   M
5   d   single     30   F
6   d children     80   F

您还可以对列进行重新排序，以便按照您想要的方式获取：

df <- df[, c("IDy", "sex", "status", "salary")]

结果：

   IDy sex   status salary
1   a   M   single     30
2   a   M children     80
3   b   F   single     50
4   c   M children     40
5   d   F   single     30
6   d   F children     80

合并不同维度的data.frames，在需要的地方创建重复项/ r dplyr

更新

1 个答案: