Question

我想知道是否可以使用数据而不是proc来计算一行中的分类变量的数量，如＆＃39; count＆＃39;上面的例子。这将允许我进一步使用数据，例如COUNT = 1或COUNT＆gt; 1检查发病率。

还可以计算每位患者整个数据集中每次诊断的次数，同时考虑是否存在重复数据？例如，该数据集中有3个CB和2个AA，但CB应为2，因为患者2记录了两次。

感谢您的时间，祝新年快乐。

Answer 1

您的问题不明确，但您可以使用union all管理您的诊断并计算不同的

selec patient count(distinct diag )
from (
  select patient, diag1 as diag
  from my_table 
  uniona all 
  select patient, diag2
  from my_table 
  uniona all 
  select patient, diag3
  from my_table 
  uniona all 
  select patient, diag4
  from my_table 
) t
group by patient

或简单地结合和计算

selec patient count(diag )
from (
  select patient, diag1 as diag
  from my_table 
  uniona
  select patient, diag2
  from my_table 
  uniona 
  select patient, diag3
  from my_table 
  uniona 
  select patient, diag4
  from my_table 
) t
group by patient

Answer 2

图像表示对于每一行，您需要计算具有非缺失值的列数。此外，您显然可以使用PROC步骤执行此操作，但想知道如何使用DATA步骤。

在DATA步骤中，您可以使用CMISS间接计算非缺失值的数量，或直接使用COUNTC计算构造值：

data have;
  attrib pid length=8 diag1-diag4 length=$5;
  input pid & diag1-diag4;
  datalines;
1  AA J9 HH6 .
2  CB . . CB
3  J10 AA CB J10 
4  B B . F90 .
5  J10 . . .
6  . . . .
run;

data have_with_count;
  set have;
  count = 4 - cmiss (of diag1-diag4);
  count_way2 = countc(catx('~', of diag1-diag4, 'SENTINEL'), '~');
run;

为了再次使用MySQL数据源，您还需要一个将您连接到该远程数据服务器的libref。

添加

使用hash或sortc可以计算行中不同的值。考虑这个示例，它将行数据的副本（作为数组）排序并计算其中的唯一值：

data want;
  set have;
  array diag diag1-diag4;
  array v(4) $5 _temporary_;
  do _n_ = 1 to dim(diag);
    v(_n_) = diag(_n_);
  end;
  call sortc(of v(*));
  uniq = 0;
  do _n_ = 1 to dim(v);
    if missing(v(_n_)) then continue;
    if uniq = 0 then 
      uniq + 1;
    else
      uniq + ( v(_n_) ne v(_n_-1) );
  end;
run;

Answer 3

利用理查德的虚拟数据计算诊断次数和唯一诊断次数：

data want;
   set have;
   array var diag:;
   length temp $30.;
   call missing(diag_num);
   do over var;
      if not missing(var) then do;
         diag_num+1;
         temp=ifc(whichc(var, temp),temp,catx(' ',temp,var));
      end;
   end;
   unique_diag=countw(temp);
   drop temp;
run;

计算SAS

3 个答案:

添加