Question

假设数据集如下所示。

dossier_manager 
NameA   
NameA   
NameB   
NameC   
NameC   
NameC   
NameD   
NameD   
NameE   
NameF

我想知道那里有多少个不同的名字。我试过了proc freq但是我在列中有太多不同的名称，这使得频率列表显着变长。那么我怎样才能得到样本结果呢？与示例中的6（不同名称）一样。

Answer 1

在proc sql中，使用count(distinct(dossier_manager))

如果您需要示例，可以使用select distinct dossier_manager outobs=个值并使用FILE *Resul; Resul=fopen(arc,"a+");限制输出中的行数

Answer 2

使用此代码获取不同的名称以及唯一计数。

proc sql;
select distinct dossier_manager ,count(distinct dossier_manager)
from test;
quit;

Answer 3

Neal，

一种常见的技术是仅报告前N个最常见的值和剩余的概要。请考虑以下示例代码：

* fake data, 10,000 rows of a normally distributed variable. (ranuni would be quite flat);
data have;
  do row = 1 to 10000;
    my_char_var = cats('x',ceil (10 + 10*rannor(1234)));
    my_num_var = ceil (10 + 10*rannor(1234));
    output;
  end;
run;

* compute the raw frequency counts;
proc freq data=have noprint order=freq ;
  table my_num_var / out=counts ;
run;

* specify parameters for desired TOP N;
%let topN = 10;
%let REMAIN_N = 0;

* dow loop over all data, outputting top N freqs and accumulating the remainder;
data want(keep=my_num_var count percent); 
  do _n_ = 1 by 1 while (not end_flag);

    set counts end=end_flag;

    if _n_ > &topN then do;
      accumN + 1;
      accumCNT + count;
      accumPCT + percent;
    end;
    else OUTPUT;
  end;

  * output remainder counts if there are any - special missing value .R used to indicate REMAINDER bin;
  if accumN ne . then do;
    my_num_var = .R;
    * my_char_var = "*REMAIN*";
    count = accumCNT;
    percent = accumPCT; 
    output;
    call symputx ('REMAIN_N', accumN);
  end;

  stop;
run;

* create a custom format that will indicate how many distinct other values are not top N;
proc format; value REMAINING .R="Remaining (&REMAIN_N.)";
proc format; value $REMAINING "*REMAIN*"="Remaining (&REMAIN_N.)";

* apply custom format in PRINT procedure, this will override the format of the tabled variable;
proc print data=want;
  format my_num_var REMAINING.;
run;

输出

Obs      my_num_var      COUNT    PERCENT

  1                 9      398      3.98
  2                11      394      3.94
  3                 6      388      3.88
  4                 8      383      3.83
  5                 7      378      3.78
  6                10      374      3.74
  7                13      374      3.74
  8                14      366      3.66
  9                12      353      3.53
 10                17      350      3.50
 11    Remaining (64)     6242     62.42

对字符变量进行相同的微调。

小的逻辑变化可以为你提供前N个人％＆gt;阈值，前N，累计％＆lt;阈值，前N个人计数＆gt;阈值或前N个累积＆lt;阈值。

Answer 4

如果您不想使用Proc SQL：

如果您只想获得ITEMS的总数，那么就这样做

- 对数据应用Proc Freq并将其放入SAS数据D1中（使用OUT语句）。

- 然后对此数据（D1）应用Proc内容数据，以通过另一个OUT STATEMENT提取观察数量。

如果您需要此代码，请告诉我。

SAS-如何计算一列中的类型

4 个答案: