我试图确定在同一公司中连续三年每年在同一个月进行交易的交易员。交易者符合条件后,应确定这三笔交易及其在该公司当月的所有后续交易。
假设我下面有一个示例数据。
data have;
input ID STOCK trandate $12.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
run;
我需要什么:
ID Stock trandate type
1 1 10/15/2009 0
1 1 01/01/2010 1
1 1 01/10/2011 1
1 1 01/15/2012 1
1 1 01/01/2013 1
1 2 01/30/2011 1
1 2 01/30/2012 1
1 2 01/30/2012 1
1 2 01/30/2013 1
1 2 01/30/2014 1
1 2 01/30/2015 1
2 1 01/20/2010 0
2 1 01/15/2011 0
2 1 01/16/2012 0
2 1 02/01/2013 0
2 2 02/01/2010 1
2 2 02/10/2011 1
2 2 02/10/2012 1
2 2 02/10/2013 1
2 2 02/10/2014 1
2 2 01/10/2015 0
我使用以下代码实现了这一目标:
proc sort data=have;
by id stock trandate;
run;
data have;
set have;
month=month(trandate);
year=year(trandate);
run;
proc sort data=have;
by id stock month year;
run;
data have;
set have;
by personid secid month year;
rungroup + (first.month or not first.month and year - lag(year) > 1);
run;
data temp;
do index = 1 by 1 until (last.rungroup);
set have;
by rungroup;
* distinct number of years in rungroup;
years_runlength = sum (years_runlength, first.rungroup or year ne lag(year));
end;
do index = 1 to index;
set have;
if years_runlength >=4 then output;
end;
run;
以上代码用于识别过去连续三年进行交易的交易者。由于我还需要这些交易者的后续交易。进一步应用以下代码。
proc sort data=temp;
by personid secid rungroup;
run;
data temp;
set temp;
by rungroup;
if first.rungroup then fyear=year;
run;
data temp(drop=fyear rename=(Locf=fyear));
do until (last.personid);
set temp;
by id stock;
locf=coalesce(fyear,locf);
output;
end;
run;
data temp;
set temp;
by rungroup;
if first.rungroup then fmonth=month;
run;
data temp;
set temp;
gap=year-fyear;
run;
proc means data=temp;
var gap;
run;
data temp;
set temp;
if gap=3 then type2=1;
type1=1;
run;
以上代码用于标记连续三年后的第一笔交易。在这种情况下,当识别出的交易与原始数据集结合时,可以识别出同一月份中标记交易以下的所有交易。从而,我可以达到“应该识别出这三笔交易及其在该公司当月的所有后续交易”的目标。以下代码用于实现此目的。
proc sort data=have;
by id stock rungroup;
run;
proc sort data=temp;
by id stock rungroup;
run;
data combine;
merge have temp;
by id stock rungroup;
run;
data combine;
set combine;
month=month(trandate);
run;
data combine1 (drop=fmonth rename=(Locf=fmonth));
do until (last.personid );
set combine;
by id stock;
locf=coalesce(fmonth,locf);
output;
end;
run;
data combine2 (drop=type2 rename=(Locf=type2));
do until (last.personid);
set combine1;
by id stock;
locf=coalesce(type2,locf);
output;
end;
run;
data combine2;
set combine2;
if month^=fmonth then type2=.;
run;
data combine2;
set combine2;
if type1=1 or type2=1 then type=1;
else type=0;
run;
我尝试了这些代码,结果看起来不错,但是我不能100%确定。另外,如您所见,我的代码相对较长且复杂。那么有人可以给我一些有关代码的建议吗?
答案 0 :(得分:0)
这是一种蛮力的方式。对于本示例,在您的示例中,我仅将其限制为2009年到2015年,但是您可以将模式扩展为允许更多年。您可以使用宏逻辑来生成代码的墙纸方面。
首先生成一个数组,您可以通过YEAR和MONTH进行索引,并在变量表示的月份交易时以1填充变量。然后检查多年来同一个月的一系列值是否连续三个“ 1”。您可以使用两个DOW循环来处理数据。第一个填充数组,第二个测试数组并设置新的flag变量。
data want ;
do until(last.stock) ;
set have ;
by id stock;
array months [1:12,2009:2015]
m1y2009-m1y2015 m2y2009-m2y2015 m3y2009-m3y2015 m4y2009-m4y2015
m5y2009-m5y2015 m6y2009-m6y2015 m7y2009-m7y2015 m8y2009-m8y2015
m9y2009-m9y2015 m10y2009-m10y2015 m11y2009-m11y2015 m12y2009-m12y2015
;
months[month(trandate),year(trandate)]=1;
end;
do until(last.stock);
set have;
by id stock;
select (month(trandate));
when (1) flag=0 ne index(cats(of m1y:),'111');
when (2) flag=0 ne index(cats(of m2y:),'111');
when (3) flag=0 ne index(cats(of m3y:),'111');
when (4) flag=0 ne index(cats(of m4y:),'111');
when (5) flag=0 ne index(cats(of m5y:),'111');
when (6) flag=0 ne index(cats(of m6y:),'111');
when (7) flag=0 ne index(cats(of m7y:),'111');
when (8) flag=0 ne index(cats(of m8y:),'111');
when (9) flag=0 ne index(cats(of m9y:),'111');
when (10) flag=0 ne index(cats(of m10y:),'111');
when (11) flag=0 ne index(cats(of m11y:),'111');
when (12) flag=0 ne index(cats(of m12y:),'111');
otherwise ;
end;
output;
end;
drop m: ;
run;