(SAS) - 为“切换”创建一个虚拟变量

时间:2012-04-25 16:11:38

标签: sas

您好我正在开发一个SAS项目,我想创建一个虚拟变量来解释医学中的偏好''。我有一个很长的数据集,按时间段,服用1型或2型药物的人。对于我的研究,我想创建一个变量来表示服用1型药物的个体,然后切换到2型,但是回到第1类。我不关心个人服药的时间间隔,只是他们遵循这种模式。

      id  month  type
      1    1       2
      1    2       2
      1    3       2
      2    1       1
      2    2       2
      2    3       1      
             ...

我有更多的月份,但只是想提供一些东西来阐明我想要得到的东西。基本上,我想计算那些喜欢主题2的科目。

4 个答案:

答案 0 :(得分:1)

好吧,没什么特别的,但它对我有用:

DATA LONG1;
input id  month  type;
cards;
1    1       2
1    2       2
1    3       2
1    4       2
1    5       2
1    6       2
1    7       2
1    8       2
1    9       2
1   10       2
2    1       1
2    2       1
2    3       1
2    4       1
2    5       1
2    6       1
2    7       1
2    8       1
2    9       1
2   10       1
3    1       1
3    2       1
3    3       1
3    4       2
3    5       1
3    6       1
3    7       1
3    8       1
3    9       1
3   10       1
;

Proc Print; run;
* 1) make a wide dataset by deconstructing the initial long data by month & rejoining by id
2) then use if/then statements to create your dummy variable, 
3) then merge the dummy variable back into your long dataset using ID;

DATA month1; set long1; where month=1; rename month=month_1 type=type_1; Proc Sort; by ID; run;
DATA month2; set long1; where month=2; rename month=month_2 type=type_2; Proc Sort; by ID; run;
DATA month3; set long1; where month=3; rename month=month_3 type=type_3; Proc Sort; by ID; run;
DATA month4; set long1; where month=4; rename month=month_4 type=type_4; Proc Sort; by ID; run;
DATA month5; set long1; where month=5; rename month=month_5 type=type_5; Proc Sort; by ID; run;
DATA month6; set long1; where month=6; rename month=month_6 type=type_6; Proc Sort; by ID; run;
DATA month7; set long1; where month=7; rename month=month_7 type=type_7; Proc Sort; by ID; run;
DATA month8; set long1; where month=8; rename month=month_8 type=type_8; Proc Sort; by ID; run;
DATA month9; set long1; where month=9; rename month=month_9 type=type_9; Proc Sort; by ID; run;
DATA month10; set long1; where month=10; rename month=month_10 type=type_10; Proc Sort; by ID; run;


DATA WIDE;
merge month1 month2 month3 month4 month5 month6 month7 month8 month9 month10; by ID; 
if (type_1=1 and type_2=1 and type_3=1 and type_4=1 and type_5=1 
and type_6=1 and type_7=1 and type_8=1 and type_9=1 and type_10=1) or 
(type_1=2 and type_2=2 and type_3=2 and type_4=2 and type_5=2 
and type_6=2 and type_7=2 and type_8=2 and type_9=2 and type_10=2) 
then switch='no '; else switch='yes '; keep ID switch; run;

DATA LONG2;
merge wide long1; by ID;
Proc Print; run;
顺便说一句:也去SAS listserv,他们喜欢这样的东西: http://www.listserv.uga.edu/archives/sas-l.html

答案 1 :(得分:0)

这适用于我使用的有限数据:

DATA Have; 
 input id month type; 
 datalines;
 1 1 1
 1 2 1
 1 3 1
 1 4 1
 1 5 1
 2 1 1
 2 2 2
 2 3 1
 2 4 1
 2 5 1
 3 1 1
 3 2 1
 3 3 2
 3 4 2
 3 5 1
 4 1 2
 4 2 2
 4 3 2
 4 4 2
 4 5 2
 ;

Data Temp(keep=id dummy);
 length dummy $15;
 retain Start Type2 dummy;
 set Have;
 by id;

 if first.id then Do;
  Start=0;
  Type2=0;
  Dummy="";
 end;

 If Type=1 then do;
  If Start=0 then Start=1;
  else if Start=1 and Type2=1 then Dummy="Switch-er-Roo";
 end;
 else do;
  if Start=1 then Type2=1;
 end;

 if last.id then output;
run;

Data Want;
 merge temp(in=a) have(in=b);
 by id;
run;

答案 2 :(得分:0)

我更喜欢@ CarolinaJay65方法,它更清洁,只涉及一次数据传递。如果你感兴趣的是那些在Type1上开始和结束但在某个时候使用Type2的患者,那么代码可以稍微简化一下。以下代码(使用@ CarolinaJay65源数据)仅输出匹配此条件的patient_id。

data switch_id (keep=id);
set have;
by id month;
retain switch;
if first.id then do;
    call missing(switch);
    if type=1 then switch=0;
    end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then output;
run;

如果您只想要符合条件的患者数量,那么您可以进一步调整此代码。

data switch (keep=count);
set have end=final;
by id month;
retain switch count 0;
if first.id then do;
    call missing(switch);
    if type=1 then switch=0;
    end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then count+1;
if final then output;
run;  

答案 3 :(得分:0)

我认为以下内容应该有效:

 DATA Have; 
 input id month type; 
 if _n_ ^= 1 and id ^= lag(id) then diftype = .;
 else diftype = dif(type);
 datalines;
 1 1 1
 1 2 1
 1 3 1
 1 4 1
 1 5 1
 2 1 1
 2 2 2
 2 3 1
 2 4 1
 2 5 1
 3 1 1
 3 2 1
 3 3 2
 3 4 2
 3 5 1
 4 1 2
 4 2 2
 4 3 2
 4 4 2
 4 5 2
 ;

proc sql;
     select case when max(diftype) = 1 and min(diftype) = -1 then 1 else 0 end as flag, * from have
  group by id
  ;

quit;