Question

这个问题是这个问题的扩展：SAS: Create a frequency variable

第一个响应中提供的代码运行良好，但如果我想添加另一个分类变量怎么办？我有一个日期变量和一个ID，分类变量。我尝试了很多东西，但这对我来说似乎最合乎逻辑（但是没有用）：

data work.frequencycounts;
 do _n_ =1  by 1 until (last.Date);
   set work.dataset;
   by Date ID;
   if first.Date & first.ID then count=0;
   count+1;
 end;
 frequency= count;
 do _n_ = 1 by 1 until (last.Date);
   set work.dataset;
   by Date ID;
   output;
 end;
run;

我应该添加一个do循环吗？谢谢你的帮助。

编辑：我的例子：

Date ID 
1 19736 H-3-10  
2 19736 H-3-12 
3 19737 E-2-10 
4 19737 E-2-10

我想要的例子：

Date ID Count
1 19736 H-3-10  1
2 19736 H-3-12  1
3 19737 E-2-10  2
4 19737 E-2-10  2

Answer 1

这会产生所需的输出。

这里发生的是您需要在BY语句中使用 last 变量来处理first. / last.处理的所有内容。如果您需要知道原因，请在datastep中添加一些put _all_;以查看不同点的值。您不应该在任何时候检查first.Date，因为如果first.Date为真，则first.ID始终为真（根据定义，first向右传播）;并且您想要[first.ID而不是first.date]的其他计数。

基本上，将初始示例视为正确，初始示例中的变量应该是by语句中的最后一个变量;在你的左边添加任意数量的附加变量，什么都不会改变。这确实需要按分组变量对数据进行排序。

data have;
input date id $;
datalines;
19736 H-3-10  
19736 H-3-12 
19737 E-2-10 
19737 E-2-10 
;;;;;
run;

data work.want;
 do _n_ =1  by 1 until (last.ID);  *last.<last variable in by group>;
   set work.have;
   by Date ID;
   if first.ID then count=0; *first.ID is what you want here.;
   count+1;
 end;
 frequency= count;   *this is not really needed - can use just the one variable consistently;
 do _n_ = 1 by 1 until (last.ID);  *again, last.<last var in by group>;
   set work.have;
   by Date ID;
   output;
 end;
run;

计算SAS中多个可变频率的组合

1 个答案: