我有两个数据集。一个具有员工偏好,一个具有潜在匹配列表。对于每个员工,我想找到匹配项并为每个员工创建一个新的数据集。数据集可以是一个大的综合集,然后可以按人分开,也可以从一开始就分开。
df1:
> emplid= c("empl1","empl2","empl3")
> c1 = c("HR", "Finance", "HR")
> c2 = c("x", "y", "z")
> c3 = c("a","b","C")
> df1 = data.frame(emplid, c1, c2, c3)
df2:
> job = c("job1", "job2", "job3", "job4")
> c4 = c("HR", "HR", "Finance", "Finance")
> c5 = c("x", "x", "y", "z")
> c6 = c("a","b", "b","c")
> df2 = data.frame(job, c4, c5, c6)
结果将是
emplid job c4 c5 c6
empl1 job1 HR x a
empl3 job4 Finance y b
上述结果可以合并,但最终我会把它分开。
当我在第一个数据帧中只有一行时,我才能成功地做到这一点,但事实并非如此。我试过循环但没有成功。
答案 0 :(得分:0)
我建议您重新命名列,以便它们匹配。这是我要做的一个例子:
emplid= c("empl1","empl2","empl3")
c1 = c("HR", "Finance", "HR")
c2 = c("x", "y", "x")
c3 = c("a","b","a")
df1 = data.frame(emplid, c1, c2, c3)
job = c("job1", "job2", "job3", "job4")
c1 = c("HR", "HR", "Finance", "Finance")
c2 = c("x", "x", "y", "z")
c3 = c("a","b", "b","c")
df2 = data.frame(job, c1, c2, c3)
temp = df1 %>%
group_by(c1,c2,c3)
df2 = df2 %>%
right_join(temp, by = c('c1','c2','c3'))
答案 1 :(得分:0)
这是一个data.table解决方案。
sudo pip install --trusted-host --upgrade pip