通过在R中对两个数据帧进行子集化和引用来创建新的数据帧

时间:2017-12-08 17:51:36

标签: r

我有两个数据集。一个具有员工偏好,一个具有潜在匹配列表。对于每个员工,我想找到匹配项并为每个员工创建一个新的数据集。数据集可以是一个大的综合集,然后可以按人分开,也可以从一开始就分开。

 df1:

> emplid= c("empl1","empl2","empl3") 
> c1 = c("HR", "Finance", "HR")
>  c2 = c("x", "y", "z") 
>  c3 = c("a","b","C")
>  df1 = data.frame(emplid, c1, c2, c3)

df2: 
>  job = c("job1", "job2", "job3", "job4") 
>  c4 = c("HR", "HR", "Finance", "Finance") 
>  c5 = c("x", "x", "y", "z") 
>  c6 = c("a","b", "b","c")
>  df2 = data.frame(job, c4, c5, c6)

结果将是

emplid      job          c4     c5        c6
empl1       job1         HR      x        a
empl3       job4         Finance y        b

上述结果可以合并,但最终我会把它分开。

当我在第一个数据帧中只有一行时,我才能成功地做到这一点,但事实并非如此。我试过循环但没有成功。

2 个答案:

答案 0 :(得分:0)

我建议您重新命名列,以便它们匹配。这是我要做的一个例子:

emplid= c("empl1","empl2","empl3") 
c1 = c("HR", "Finance", "HR")
c2 = c("x", "y", "x") 
c3 = c("a","b","a")
df1 = data.frame(emplid, c1, c2, c3)

job = c("job1", "job2", "job3", "job4") 
c1 = c("HR", "HR", "Finance", "Finance") 
c2 = c("x", "x", "y", "z") 
c3 = c("a","b", "b","c")
df2 = data.frame(job, c1, c2, c3)

temp = df1 %>%
  group_by(c1,c2,c3)

df2 = df2 %>%
  right_join(temp, by = c('c1','c2','c3'))

答案 1 :(得分:0)

这是一个data.table解决方案。

sudo pip install --trusted-host --upgrade pip