Question

我试图通过一个与离散变量不唯一的变量进行分组，以获得每个非唯一变量的唯一组合。例如：

A B
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c 
5 e

我想：

A Unique_combos
1      a, b
2      a
3      a
4      b, d
5      e

我目前的尝试是：

proc sql outobs=50;
    title 'Unique Combinations of b per a';
    select a, b
    from mylib.mydata
    group by distinct a;
run;

Answer 1

如果您乐意使用数据步骤而不是proc sql，则可以使用retain关键字与第一个/最后一个处理相结合：

示例数据：

data have;
  attrib b length=$1 format=$1. informat=$1.;
  input a
        b $
        ;
  datalines;
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c 
5 e
;
run;

消除重复项并确保数据按第一次/最后一次处理进行排序：

proc sql noprint;
  create table tmp as select distinct a,b from have order by a,b;
quit;

迭代不同的列表并将b的值连接在一起：

data want;
  length combinations $200; * ADJUST TO BE BIG ENOUGH TO STORE ALL THE COMBINATIONS;

  set tmp;
  by a;

  retain combinations '';

  if first.a then do;
    combinations = '';
  end;

  combinations = catx(', ',combinations, b);

  if last.a then do;
    output;
  end;

  drop b;
run;

<强>结果：

combinations    a

    a, b        1
    a           2
    a           3
    b, d        4
    c, e        5

Answer 2

您只需在distinct子句中添加select关键字，例如：

title 'Unique Combinations of b per a';
proc sql outobs=50;
select distinct a, b
  from mylib.mydata;

run语句是不必要的，sql过程通常以quit结束 - 虽然我个人从不使用它，因为语句将在按分号时执行，并且程序在命中时终止下一步的边界。

获取SAS中每个变量的唯一组合

2 个答案: