如何使用特定系统设置数据

时间:2019-05-23 13:36:00

标签: sas datastep

我没什么问题。作为示例表HAVE1和HAVE2,我想创建类似WANT的表,在HAVE2的特定行数据下方设置到所有列(由于从COL1到COL19,而没有COL20),并获得像WANT一样的表。我该怎么办?

data HAVE1;
infile DATALINES dsd missover;
input ID NAME $ COL1-COL20;
CARDS;
1, A1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ,20
2, A2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
3, B1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16, 16, 20, 21 , 21, 22 
4, B2, 1, 20, 3, 20, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 23, 22, 23
5, C1, 20, 2, 3, 4, 5, 6, 7, 8, 9, 10, 30, 12, 13, 14, 15, 16, 17, 17, 17, 17
6, C2, 1, 2, 3, 20, 5, 6, 7, 8, 02, 10, 11, 12, 30, 14, 15, 16, 17, 18, 19, 20
;run;

Data HAVE2;
infile DATALINES dsd missover;
input ID NAME $ WARTOSC;
CARDS;
1, SUM, 50000
2, SUM, 55000
3, SUM, 60000
;run;

DATA WANT;
infile DATALINES dsd missover;
input ID NAME $ COL1-COL20;
CARDS;
1, A1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ,20
1, SUM_1    ,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,50000
2, A2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
2, SUM_2, 55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000,55000
3, B1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16, 16, 20, 21 , 21, 22 
3, SUM_3,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000,60000
4, B2, 1, 20, 3, 20, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 23, 22, 23
5, C1, 20, 2, 3, 4, 5, 6, 7, 8, 9, 10, 30, 12, 13, 14, 15, 16, 17, 17, 17, 17
6, C2, 1, 2, 3, 20, 5, 6, 7, 8, 02, 10, 11, 12, 30, 14, 15, 16, 17, 18, 19, 20
;run;

2 个答案:

答案 0 :(得分:1)

您的want表非常独特,您最好生成一个报表,而不是仅仅生成proc print的数据集。

无论如何,对于have2而言,该步骤将需要转换名称和复制wartosc

例如:

data want (drop=wartosc);
  set have1 end=end1;
  output;

  if not end2 then 
    set have2(rename=id=id2) end=end2;

  if id = id2 then do;
    array col col1-col20;
    do over col; col=wartosc; end;
    name = catx('_', name, id);
    output;
  end;

run;

如果发生want2行多于want1的情况,您可能需要更多的逻辑。

答案 1 :(得分:1)

所以听起来您只需要重新格式化第二个数据集即可匹配您想要的内容,然后将它们组合在一起。只需将WARTOSC的值复制到所有列中,然后删除原始的WARTOSC变量即可。

data HAVE1;
  infile CARDS dsd truncover;
  input ID NAME $ COL1-COL5;
CARDS;
1, A1, 1, 2, 3, 4, 5
2, A2, 1, 2, 3, 4, 5
3, B1, 3, 4, 5, 6, 7
4, B2, 1, 20, 3, 20, 5
5, C1, 20, 2, 3, 4, 5
6, C2, 1, 2, 3, 20, 5
;

data HAVE2;
  infile CARDS dsd truncover;
  input ID NAME $ WARTOSC;
CARDS;
1, SUM, 50000
2, SUM, 55000
3, SUM, 60000
;

data have2_fixed;
  set have2;
  name=catx('_',name,id);
  array col col1-col5;
  do over col ; col=wartosc; end;
  drop wartosc;
run;

data want ;
  set have1 have2_fixed;
  by id;
run;

如果数据集很大,您实际上可以在合并期间进行更改。

data want ;
  set have1 have2 (in=in2);
  by id;
  array col col1-col5;
  if in2 then do;
    name=catx('_',name,id);
    do over col ; col=wartosc; end;
  end;
  drop wartosc;
run;

结果:

Obs    ID    NAME      COL1     COL2     COL3     COL4     COL5

 1      1    A1           1        2        3        4        5
 2      1    SUM_1    50000    50000    50000    50000    50000
 3      2    A2           1        2        3        4        5
 4      2    SUM_2    55000    55000    55000    55000    55000
 5      3    B1           3        4        5        6        7
 6      3    SUM_3    60000    60000    60000    60000    60000
 7      4    B2           1       20        3       20        5
 8      5    C1          20        2        3        4        5
 9      6    C2           1        2        3       20        5