哪个错误,因为它只包含idnumber 2和4。
数据:
DATA WORK.transpose_csv;
LENGTH
idnumber 8
start_end $ 5
date 8 ;
FORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INFORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INPUT
idnumber : ?? BEST1.
start_end : $CHAR5.
date : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;
我最好的尝试:
proc transpose data=transpose_csv
out =wide;
by idnumber;
id start_end ;
run;
如本文所示,它可以在R中轻松完成,但我需要在SAS中执行此操作: Spread with duplicate identifiers (using tidyverse and %>%)
答案 0 :(得分:1)
这里proc transpose
的问题是你可以为特定的idnumber提供多个事件。如果您能够更改源数据以添加额外的id变量,例如event_id,然后它会使任务更容易。
您可以继续proc transpose
,如下所示,然后是数据步骤,将开始/结束日期设置为1行,或者只在单个数据步骤中执行,并对某些值进行硬编码。还有其他方法,例如哈希解决方案可能适用于此类问题。
编辑:添加了第一个创建event_id的方法,使后续的proc transpose
变得轻松
/* source data */
DATA WORK.transpose_csv;
LENGTH
idnumber 8
start_end $ 5
date 8 ;
FORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INFORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INPUT
idnumber : ?? BEST1.
start_end : $CHAR5.
date : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;
/* method1 */
proc transpose data=transpose_csv
out =wide1 (drop=_: start_end);
by idnumber start_end notsorted;
id start_end ;
run;
data wide2;
set wide1;
by idnumber;
retain _start;
if not missing(start) then _start=start;
if not missing(end) or last.idnumber then do;
start=_start;
output;
end;
drop _start;
run;
/* method2 */
data wide3;
set transpose_csv;
by idnumber;
retain start;
format start end yymmdd10.;
if start_end='start' then start=date;
if start_end='end' then do;
end=date;
output;
end;
else if last.idnumber then output;
drop start_end date;
run;
/* method3 */
data transpose_csv1;
set transpose_csv;
by idnumber;
if first.idnumber then event_id=0;
event_id+(start_end='start');
run;
proc transpose data=transpose_csv1
out =wide4 (drop=_: event_id);
by idnumber event_id;
id start_end ;
run;