将行转置为列

时间:2017-04-07 07:51:22

标签: sas

我的输入数据:

Input data

首选输出数据:

Preferred output

我最好的尝试:

哪个错误,因为它只包含idnumber 2和4。

My best try

数据:

    DATA WORK.transpose_csv;
LENGTH
    idnumber           8
    start_end        $ 5
    date               8 ;
FORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INFORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INPUT
    idnumber         : ?? BEST1.
    start_end        : $CHAR5.
    date             : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;

我最好的尝试:

    proc transpose data=transpose_csv
                   out =wide;
                   by idnumber;
                   id start_end ;
    run;

如本文所示,它可以在R中轻松完成,但我需要在SAS中执行此操作: Spread with duplicate identifiers (using tidyverse and %>%)

1 个答案:

答案 0 :(得分:1)

这里proc transpose的问题是你可以为特定的idnumber提供多个事件。如果您能够更改源数据以添加额外的id变量,例如event_id,然后它会使任务更容易。

您可以继续proc transpose,如下所示,然后是数据步骤,将开始/结束日期设置为1行,或者只在单个数据步骤中执行,并对某些值进行硬编码。还有其他方法,例如哈希解决方案可能适用于此类问题。

编辑:添加了第一个创建event_id的方法,使后续的proc transpose变得轻松

/* source data */
DATA WORK.transpose_csv;
LENGTH
    idnumber           8
    start_end        $ 5
    date               8 ;
FORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INFORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INPUT
    idnumber         : ?? BEST1.
    start_end        : $CHAR5.
    date             : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;

/* method1 */
proc transpose data=transpose_csv
               out =wide1 (drop=_: start_end);
               by idnumber start_end notsorted;
               id start_end ;
run;

data wide2;
set wide1;
by idnumber;
retain _start;
if not missing(start) then _start=start;
if not missing(end) or last.idnumber then do;
        start=_start;
        output;
        end;
drop _start;
run;


/* method2 */
data wide3;
set transpose_csv;
by idnumber;
retain start;
format start end yymmdd10.;
if start_end='start' then start=date;
if start_end='end' then do;
    end=date;
    output;
    end;
else if last.idnumber then output;
drop start_end date;
run;

/* method3 */
data transpose_csv1;
set transpose_csv;
by idnumber;
if first.idnumber then event_id=0;
event_id+(start_end='start');
run;

proc transpose data=transpose_csv1
                   out =wide4 (drop=_: event_id);
                   by idnumber event_id;
                   id start_end ;   
run;