Question

我有一个像这样的SAS数据集：

col1   col2   col3   col4    col5  col6
A1     B1     C1     D1      E1    $100
A1     B1     C1     D2      E2    $200
A2     B2     C2     D3      E3    $500

前3列是我的关键列。我需要提取col6的最高值的行。

所以我能做到：

proc sql;
   create table temp as 
   select col1,col2,col3,max(col6) as col6 
   from dataset 
   group by 1,2,3;
   select * from dataset t1 
   inner join temp t2 
   where t1.col1 = t2.col1 and t1.col2 = t2.col2 
     and t1.col3 = t2.col3 and t1.col6 = t2.col6;
quit;

但是如何通过一次传递数据来实现同样的目标呢？有办法吗？

Answer 1

对于许多用途，您的方法非常好。如果只使用一次传递实际上是必不可少的，则可以使用数据步骤和哈希对象。这会读取每条记录一次，并在每次找到col6比col6更高的行时更新哈希对象中的单行。

data _null_;
    if 0 then set have; /*Make sure all vars used in the hash object are set up with the correct types*/
    retain highest_so_far;
    if _n_ = 1 then do;
        highest_so_far = col6;
        declare hash hi_row();
        hi_row.definekey(co1,col2,col3,col4,col5,col6);
        hi_row.definedone();
    end;

    set have end=eof;

    if col6 > highest_so_far then do;
        hi_row.clear();
        hi_row.add();
        highest_so_far = col6;
    end;

    if (eof) then hi_row.output(want);
run;

如果有最高的平局，这个程序将返回第一个，但可以修改它以返回任意数量的关系。

Answer 2

数据 COL;
输入（COL1-COL5）（：$ 2）COL6：逗号.;
卡;
A1 B1 C1 D1 E1 $ 100
A1 B1 C1 D2 E2 $ 200
A2 B2 C2 D3 E3 $ 500
;;;;
运行的;
的 PROC 打印的;
的运行的;
的 PROC 摘要数据= COL;
输出OUT = MaxRow的idgroup（MAX（COL6）出（_all _）=）;
运行的;
的 PROC 打印的;
的运行的;

如何提取具有列的最大值的SAS记录

2 个答案: