在SAS中的许多变量中查找匹配的日期和标题

时间:2018-04-03 15:00:12

标签: sas match

我有一个数据集,这是一份员工列表,其中包含他们的职位以及职位名称在其整个职业生涯中开始的月份和年份。 它看起来像这样:employeeID JobTitle1 MonthYearofTitle1 Department1 Jobtitle2 MonthYearofTitle2 Department2等。

我有另一个员工列表,这些员工不在第一个数据集中,只有一个职位和职称日期。我的目标是根据工作头衔和月份/年份将第二组数据中的员工与第一组中的员工进行匹配,但我完全不确定如何进行此匹配,因为它涉及多个变量之间的信息。

换句话说,如果我在2017年1月成为管理员的MarySue,我希望将她与John Doe匹配,后者在2017年1月成为管理员,并将其标记为进行进一步分析的匹配。

不幸的是,我不确定从哪里开始使用我的代码,所以我没有尝试过的东西。数据看起来像这样

数据集1

employeeID JobTitle1 MonthYearofTitle1 Jobtitle2 MonthYearofTitle2
JohnDoe    Intern    Jan2016           Admin     Jan2017
JakeSo     VP        Jul2017
JulieDo    Manager   April2017

数据集2

employeeID JobTitle1 MonthYearofTitle1 
MarySue    Admin     Jan2017
JaneDoe    Admin     Jan2017  
Greg       VP        Jul2017

期望的结果/数据集:

Employee1   Employee2   Title   Date    Flag
JohnDoe     MarySue     Admin   Jan2017 Match
JakeSo      Greg        VP      Jul2017 Match
JulieDo                 Admin   Jan2017 No Match

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

这就是我要做的。首先更改两个数据集,以便只有以下列:

employeeID, JobTitle, MonthYear

然后执行proc sql:

proc sql noprint:
     select a.employeeID,b.employeeId,a.jobTitle,a.MonthYear
            from firstdataset as a
                 inner join seconddataset as b
                      on a.employeeId = b.employeeId
                       and a.jobTitle = b.jobTitle
                       and a.MonthYear = b.MonthYear;                           
quit;

放手一搏让我知道你得到了什么

另外,根据您的数据,您可以使用以下命令创建初始表:

data b;
keep employeeId title monthyear;
set a;
array x [*] _CHARACTER_;
y= dim(x);

do i = 2 to y;
   if (mod(i,2) = 0)then do;
      Title = x[i];
      monthyear = x[i+1];
      output;
   end;
end ;
run;

答案 1 :(得分:1)

您可以执行FULL JOINLeft JOIN并使用Case语句创建计算字段以指示匹配的记录。

下面的代码将执行完全加入并创建一个标志字段:

创建Table1&表2:只有1条记录匹配

data table1;
input employeeID $ JobTitle1 $ MonthYearofTitle1 Jobtitle2 $ MonthYearofTitle2 ;
informat MonthYearofTitle1 monyy7. MonthYearofTitle2 monyy7.;
format MonthYearofTitle1 monyy7. MonthYearofTitle2 monyy7.;
datalines;
JohnDoe Intern Jan2016 Admin Jan2017
TomJones Junior Jul2016 Admin Jul2017
;
run;

data table2;
input employeeID $ JobTitle1 $ MonthYearofTitle1 ;
informat MonthYearofTitle1 monyy7.;
format MonthYearofTitle1 monyy7.;
datalines;
MarySue Admin Jan2017
JackieC Admin Jul2013
;
run;

完全加入:获取所有数据

proc sql;
create table want as
select
t1.employeeID as t1_employeeID , t2.employeeID as t2_employeeID, 
t2.JobTitle1 as t2_JobTitle,
t2.MonthYearofTitle1 as t2_MonthYearofTitle1,
case when 
((t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1)) then "Match"
else "No-Match" end as flag
from table1 as t1 full join table2 as t2
on  (t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1)
;
quit; 

<强>结果:

Full Join

t1_employeeID=JohnDoe t2_employeeID=MarySue t2_JobTitle=Admin t2_MonthYearofTitle1=JAN2017 flag=Match
t1_employeeID=  t2_employeeID=JackieC t2_JobTitle=Admin t2_MonthYearofTitle1=JUL2013 flag=No-Match
t1_employeeID=TomJones t2_employeeID=  t2_JobTitle=  t2_MonthYearofTitle1=. flag=No-Match

<强>更新

左连接:仅获取表1中的记录

proc sql;
create table want as
select
t1.employeeID as Employee1    , t2.employeeID as Employee2, 
coalescec(t2.JobTitle1,t1.JobTitle2,t1.JobTitle1) as Title,
coalesce(t2.MonthYearofTitle1,t1.MonthYearofTitle2,t1.MonthYearofTitle1) as Date format monyy7.,
case when 
((t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1)) then "Match"
else "No-Match" end as Flag
from table1 as t1 left join table2 as t2
on  (t1.JobTitle1=t2.JobTitle1 and t1.MonthYearofTitle1=t2.MonthYearofTitle1) or (t1.JobTitle2=t2.JobTitle1 and t1.MonthYearofTitle2=t2.MonthYearofTitle1) 
;
quit;

Left Join