Question

我有一个每小时频率的面板数据集。如果在任何给定的一小时间隔内观察少于200次观察，我想删除所有观察结果。因此，我首先计算每小时的观察次数N，然后删除N＆lt; data lib.data; set lib.data; retain I; by date hour; if first.date or first.hour then I=1; else I=I+1; run; proc sql; create table lib.data1 as select a.*, max(I) as N from lib.data as a group by date, hour order by date, hour; quit; data lib.data (drop= i n); set lib.data; if n < 200 then delete; run; 200.但是，步骤2中常见的proc sql耗尽了我所有的C盘可用空间。有没有更好的方法来实现我的目标？

 /* ----------------- job_name ----------------- */ 

  update_job: job_name      job_type: CMD 
  command: . /home/../mybashScript.sh "param1Value" "param2Value"
  machine: machine.domain.com 
  owner: username
 /* Other Parameters like profile, date conditions etc. */

Answer 1

使用双DOW循环。第一个将计算记录数。然后第二个可以使用该计数有条件地执行OUTPUT语句。

data want ;
  do until (last.hour);
    set lib.data;
    by date hour;
    n=sum(n,1);
  end;
  do until (last.hour);
    set lib.data;
    by date hour;
    if n >= 200 then output;
  end;
run;

Answer 2

PROC SQL本身不是问题。没有GROUP BY中所有非汇总列的意外后果（例如重新汇总数据）。这是一个SQL解决方案，希望不会炸毁您的驱动器。

proc sql;
create table want as
select
    a.*
from
    lib.data  a
    join
    (select 
        date,
        hour,
        count(*)
    from
        lib.data
    group by date, hour
    having count(*) >= 200)  b
        on
        a.date = b.date and
        a.hour = b.hour
;
quit;

Answer 3

您可以尝试使用哈希表来存储前200条记录。当你从哈希表中获得第200个记录输出数据时，从当前时间到达其余的观察结果。下面的代码显示了它的工作原理：

data lib.data (drop= counter rc);
    set lib.data;
    by date hour;
    retain counter 0;

    If _N_ =1 then do;
        declare hash hs(multidata:'yes');
        hs.definekey('date','hour');
        hs.definedone();
    end;
/*if first record in hour zero counter*/
    if first.hour then do;
        counter=0;
    end;
/*increment counter*/
    counter = counter+1;
/*if counter less then 200 add record to hash table*/
    if counter < 200 then do;
        hs.add();
    end;
    /*if counter=200 output current record and record from hash*/
    if counter = 200 then do;
        output;
        rc = hs.find();
        do while(rc=0);
            output;
            rc= hs.find_next();
        end;
    end;
    /*if counter greater then 200 output current record*/
    if counter > 200 then output;
/*if last record in hour clear hash*/
    if last.hour then do;
        hs.clear();
    end;
run;

如何避免SAS中的proc sql

3 个答案: