Question

我在SAS中有一张表，我试图找到给定列的最大值和第二个最大值。

例如：

我的代码需要找到这个：

Id Column
2   50000
6   7000

有没有办法在proc sql（甚至用SAS语言）中做到这一点？

Answer 1

假设没有关系，请按降序对表进行排序，然后取前2个值。

proc sort data=have out=temp;
by descending column;
run;

data want;
set temp(obs=2);
run;

如果您有联系并且只想要不同的值，请尝试PROC SORT上的nodupkey选项：

proc sort data=have out=temp nodupkey;
by descending column;
run;

data want;
set temp(obs=2);
run;

Answer 2

PROC UNIVARIATE将为您执行此操作，如the documentation for the proc中所述：

proc univariate data=sashelp.class;
  var height;
  id name;
run;

查看Extreme Observations表。

如果您想在数据集中使用它，只需使用extremeobs输出对象：

ods output extremeobs=extremes;
proc univariate data=sashelp.class;
  var height;
  id name;
run;
ods output close;

然后过滤到high变量以及第4和第5次观察。这与几乎任何其他解决方案的关系都有同样的关注。

Answer 3

另一个选项：PROC MEANS使用IDGROUP。这会为您提供水平/宽度数据集，但您可以根据需要使用PROC TRANSPOSE或数据步骤对其进行转置。有关详细信息，请参阅文章Transposing Data Using PROC SUMMARY'S IDGROUP Option。

proc means data=sashelp.class;
  var height;
  output out=classout idgroup(max(height) out[2]  (height name)=height name) /autoname;
run;

Answer 4

为了完整起见，这是一个proc sql解决方案。不一定比任何其他人更好，也不能应对关系。

data have;
input Id Column;
datalines;
1  100
2  50000
3  50 
4  4000
5  97
6  7000
;
run;

proc sql outobs=2;
create table want as
select * from have
order by column desc;
quit;

最后，使用proc rank的解决方案确实包含了关系

data have;
input Id Column;
datalines;
1  100
2  50000
3  50000 
4  4000
5  97
6  7000
;
run;

proc rank data=have out=want_ties (where=(column_rank<=2)) descending ties=dense;
var column;
ranks column_rank;
run;

Answer 5

这里有一个不同的proc sql选项 - 这个选择两个最高的不同值，以及所有具有这些值的ID。这有点令人费解，只有在SAS处理汇总统计数据时才有用。

proc sql;
  select age, name
    from sashelp.class
    where age =
      (select max(age) from sashelp.class)
      or age = 
      (select max(age) from 
       (select case when age=max(age) then . else age end as age from sashelp.class)
      )
    ;
quit;

如何查找列

5 个答案: