我有一个数据集,这是一份员工列表,其中包含他们的职位以及职位名称在其整个职业生涯中开始的月份和年份。 它看起来像这样:employeeID JobTitle1 MonthYearofTitle1 Department1 Jobtitle2 MonthYearofTitle2 Department2等。
我有另一个员工列表,这些员工不在第一个数据集中,只有一个职位和职称日期。我的目标是根据工作头衔和月份/年份将第二组数据中的员工与第一组中的员工进行匹配,但我完全不确定如何进行此匹配,因为它涉及多个变量之间的信息。
换句话说,如果我在2017年1月成为管理员的MarySue,我希望将她与John Doe匹配,后者在2017年1月成为管理员,并将其标记为进行进一步分析的匹配。
不幸的是,我不确定从哪里开始使用我的代码,所以我没有尝试过的东西。数据看起来像这样
数据集1
employeeID JobTitle1 MonthYearofTitle1 Jobtitle2 MonthYearofTitle2
JohnDoe Intern Jan2016 Admin Jan2017
JakeSo VP Jul2017
JulieDo Manager April2017
数据集2
employeeID JobTitle1 MonthYearofTitle1
MarySue Admin Jan2017
JaneDoe Admin Jan2017
Greg VP Jul2017
期望的结果/数据集:
Employee1 Employee2 Title Date Flag
JohnDoe MarySue Admin Jan2017 Match
JakeSo Greg VP Jul2017 Match
JulieDo Admin Jan2017 No Match
有人可以帮忙吗?
答案 0 :(得分:1)
这就是我要做的。首先更改两个数据集,以便只有以下列:
employeeID, JobTitle, MonthYear
然后执行proc sql:
proc sql noprint:
select a.employeeID,b.employeeId,a.jobTitle,a.MonthYear
from firstdataset as a
inner join seconddataset as b
on a.employeeId = b.employeeId
and a.jobTitle = b.jobTitle
and a.MonthYear = b.MonthYear;
quit;
放手一搏让我知道你得到了什么
另外,根据您的数据,您可以使用以下命令创建初始表:
data b;
keep employeeId title monthyear;
set a;
array x [*] _CHARACTER_;
y= dim(x);
do i = 2 to y;
if (mod(i,2) = 0)then do;
Title = x[i];
monthyear = x[i+1];
output;
end;
end ;
run;
答案 1 :(得分:1)
您可以执行FULL JOIN
或Left JOIN
并使用Case
语句创建计算字段以指示匹配的记录。
下面的代码将执行完全加入并创建一个标志字段:
创建Table1&表2:只有1条记录匹配
data table1;
input employeeID $ JobTitle1 $ MonthYearofTitle1 Jobtitle2 $ MonthYearofTitle2 ;
informat MonthYearofTitle1 monyy7. MonthYearofTitle2 monyy7.;
format MonthYearofTitle1 monyy7. MonthYearofTitle2 monyy7.;
datalines;
JohnDoe Intern Jan2016 Admin Jan2017
TomJones Junior Jul2016 Admin Jul2017
;
run;
data table2;
input employeeID $ JobTitle1 $ MonthYearofTitle1 ;
informat MonthYearofTitle1 monyy7.;
format MonthYearofTitle1 monyy7.;
datalines;
MarySue Admin Jan2017
JackieC Admin Jul2013
;
run;
完全加入:获取所有数据
proc sql;
create table want as
select
t1.employeeID as t1_employeeID , t2.employeeID as t2_employeeID,
t2.JobTitle1 as t2_JobTitle,
t2.MonthYearofTitle1 as t2_MonthYearofTitle1,
case when
((t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1)) then "Match"
else "No-Match" end as flag
from table1 as t1 full join table2 as t2
on (t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1)
;
quit;
<强>结果:强>
t1_employeeID=JohnDoe t2_employeeID=MarySue t2_JobTitle=Admin t2_MonthYearofTitle1=JAN2017 flag=Match
t1_employeeID= t2_employeeID=JackieC t2_JobTitle=Admin t2_MonthYearofTitle1=JUL2013 flag=No-Match
t1_employeeID=TomJones t2_employeeID= t2_JobTitle= t2_MonthYearofTitle1=. flag=No-Match
<强>更新强>
左连接:仅获取表1中的记录
proc sql;
create table want as
select
t1.employeeID as Employee1 , t2.employeeID as Employee2,
coalescec(t2.JobTitle1,t1.JobTitle2,t1.JobTitle1) as Title,
coalesce(t2.MonthYearofTitle1,t1.MonthYearofTitle2,t1.MonthYearofTitle1) as Date format monyy7.,
case when
((t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1)) then "Match"
else "No-Match" end as Flag
from table1 as t1 left join table2 as t2
on (t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1)
;
quit;