Question

我正在尝试选择200个主题中的前10个，每个主题可以有多个行，每个主题由ID变量唯一标识。数据按变量amount_paid（降序金额）selection_flag（升序 - 1或2）排序。例如，当前数据看起来像这样，大约有200个唯一ID：

ID   amount_paid   selection_flag   group
191  $10               1            R3  
101  $5                2            R2   
101  $3                2            R1 
750  $2                2            R0
250  $1                2            R0

我尝试选择不同的ID，分配枚举，合并，并仅选择数字等于或小于10的ID。例如：

ID   number
191  1
101  2
101  2
750  3
250  4

But using distinct changes the order of IDs:

    ID   number
    101  1
    101  1
    191  2
    250  3
    750  4

我也尝试过proc sql（outobs = 10），但只返回顶行而不是顶部ID及其所有相关行。

理想情况下，我想选择前十个ID及其各自的数据行，并保持按amount_paid（升序）和selection_group（降序）排序的顺序完好无损。对此有任何建议将不胜感激！

已尝试的示例代码：

创建Universe的代码（创建选择标志，因为如果在数据中需要所有R3，R4和R5：

PROC SQL;
   CREATE TABLE WORK.univ1 AS 
   SELECT distinct t1.*,
          /* Selection Flag */
            (case when group in ('R3','R4','R5') then 1 else 2 end) As selection_flag
      FROM RawData t1 Left Join Value_Cde_Excl t2 ON (t1.ID = t2.ID)
                              Left Join Cond_Cde_Excl t3 ON (t1.ID= t3.CH_ICN)
                              Left Join Diag_Excl t4 ON (t1.ID = t4.CH_ICN)
      WHERE t1.y BETWEEN 'R0' AND 'RZ' AND   
                  t1.ID NE t2.ID AND
                  t1.ID NE t3.ID AND
                  t1.ID NE t4.ID
    ORDER BY Selection_Flag,
              Amt_Paid DESC,
                  ID;
QUIT;    


Code to pull distinct ids (loses order):
PROC SQL;
   CREATE TABLE WORK.sample AS 
   SELECT distinct ID
      FROM univ1;
QUIT;  

Code to put distinct ids in a macro (loses order):

/*Create a macro variable of all ICNs */
proc sql noprint;
select distinct ID
into :ID_LIST separated by ' ' 
from univ1;
quit;

code to select top 10 observations (keeps order, but has duplicated ids)
PROC SQL outobs=10;
   CREATE TABLE WORK.sample AS 
   SELECT ID
      FROM univ1;
QUIT;

Answer 1

试试这个：

步骤1：保存数据集的当前顺序

创建一个索引变量，用于定义当前所在的顺序。

data ordered;
     set have;
     Order+1;
run;

输出：

您现在可以根据需要操纵数据而不会丢失原始订单。

第2步：删除重复的ID

您想要选择前10个ID，但想要删除重复项。我们按ID和Order对其进行排序，以便将所有重复的ID放在一起。

proc sort data=ordered
    by ID Order;
run;

您的数据集应如此：

我们希望保留每个ID的第一个观察结果。我们可以通过两种方式删除重复项：通过数据步骤，或使用proc sort选项的其他equals语句。我们将使用其他proc sort，因为它具有dupout选项。

proc sort data=ordered
          out=ordered_nodupes
          dupout=dupes
          nodupkey
          equals;
    by ID;
run;

您现在应该有两个数据集：

Ordered_NoDupes
ID   Order
101  2
191  1
250  5
750  4
__________
  Dupes
ID   Order
101  3

第3步：排序回原始订单

最后，回到原始顺序，只保留前10个观察结果。

proc sort data=ordered_nodupes
          out=Top_10_IDs(obs=10);
     by order;
run;

我们现在有十大ID：

第4步：仅选择前10个ID

我们将使用Top_10_IDs数据集作为键来查找主数据集中ID为前10位的所有行。

proc sql noprint;
    create table want as
        select *
        from have
        where ID IN(select ID from Top_10_IDs);
quit;

Answer 2

如果你只想要前10个ID＆所有相关行（基于ID）：

data want1 ;
  set univ1 ;
  by ID notsorted ;
  if first.ID then id_cnt + 1 ;
  if id_cnt <= 10 then output ;
run ;

SAS-选择顶级主题 - 每个主题都有多行

2 个答案: