Question

以下是我的数据表的简要介绍

stnd_y person_id recu_day date 

2002   100       20020929 02-09-29
2002   100       20020930 02-09-30
2002   100       20021002 02-10-02
2002   101       20020927 02-09-27
2002   101       20020928 02-09-28
2002   102       20021001 02-10-01
2002   103       20021003 02-10-03
2002   104       20021108 02-11-08
2002   104       20021112 02-11-12

而且，我想按如下方式制作

stnd_y person_id recu_day date      Admission

2002   100       20020929 02-09-29  1
2002   100       20020930 02-09-30  2
2002   100       20021002 02-10-02  3
2002   101       20020927 02-09-27  1
2002   101       20020928 02-09-28  2
2002   102       20021001 02-10-01  1
2002   103       20021003 02-10-03  1
2002   104       20021108 02-11-08  1
2002   104       20021112 02-11-12  2

我的意思是，我想通过recu_day和date亲自为录取频率制作一个变量（这个变量意味着住院日期）。

然后，我将以下内容与sas一起使用，

proc sort data=old out=new;
by person_id recu_day;
data new1;
set new;
retain admission 0;
by person_id recu_day;
if recu_day^=lag(recu_day) and(or) person_id^=lag(person_id) then 
admission+1;
run;

而且，

data new1;
set new ;
by person_id recu_day;
retain adm 0;
if first.person_id and(or) first.recu_day then admission=admission+1;
run;

但是，那些不起作用。我怎么解决这个问题？请让我知道这件事。

Answer 1

您在第二次尝试时非常接近，但您的主要问题是每次person_id更改时您都不会重置录取。

也没有必要使用first.recu_day，因为样本数据中的每条记录都是1。如果peson_id没有从上一行更改，则first.person_id就足够了，因为您希望将数字增加1。

但是，在by语句中包含recu_day非常有用，因为如果数据没有正确排序，这会强制出错。

data have;
input stnd_y person_id recu_day date :yymmdd8.; 
format date yymmdd8.;
datalines;
2002   100       20020929 02-09-29
2002   100       20020930 02-09-30
2002   100       20021002 02-10-02
2002   101       20020927 02-09-27
2002   101       20020928 02-09-28
2002   102       20021001 02-10-01
2002   103       20021003 02-10-03
2002   104       20021108 02-11-08
2002   104       20021112 02-11-12
;
run;

data want;
set have;
by person_id recu_day;
if first.person_id then admission=0;
admission+1;
run;

如何获得每个组的识别号码？

1 个答案: