Question

我有一个消费者面板数据，每周记录在零售店的消费。唯一标识符是家庭ID。如果支出中出现超过五个零，我想删除观察结果。也就是说，家庭在五周内没有购买任何东西。一旦确定，我将删除与家庭ID相关的所有观察结果。有谁知道如何在SAS中实现此过程？感谢。

Answer 1

我认为proc SQL在这里会很好。

这可以通过一个更复杂的子查询一步完成，但最好将其分解为两个步骤。

计算每个家庭ID有多少个零。
过滤为仅包含5个或更少零的家庭ID。

proc sql; create table zero_cnt as select distinct household_id, sum(case when spending = 0 then 1 else 0 end) as num_zeroes from original_data group by household_id;

create table wanted as
select *
from original_data   
where household_id in (select distinct household_id from zero_cnt where num_zeroes <= 5);  
quit;

编辑：

如果零必须是连续的，那么构建要排除的ID列表的方法是不同的。

* Sort by ID and date;
proc sort data = original_data out = sorted_data;  
by household_id date;
run;

使用延迟运算符：检查以前的支出金额。

有关LAG的更多信息，请访问：http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm

data exclude;   
  set sorted;   
  by household_id;    
  array prev{*} _L1-_L4;  
 _L1 = lag(spending);  
 _L2 = lag2(spending);  
 _L3 = lag3(spending);  
 _L4 = lag4(spending);  

  * Create running count for the number of observations for each ID;
  if first.household_id; then spend_cnt = 0;  
  spend_cnt + 1;  

  * Check if current ID has at least 5 observations to check. If so, add up current spending and previous 4 and output if they are all zero/missing;  
  if spend_cnt >= 5 then do;  
    if spending + sum(of prev) = 0 then output;  
  end;  
  keep household_id;
run;

然后只需使用子查询或匹配合并删除“排除”中的ID即可。数据集。

proc sql;  
  create table wanted as  
  select *  
  from original_data;  
  where household_id not in(select distinct household_id from excluded);  
quit;

SAS软件：如何删除因变量超过五个零的观测值

1 个答案: