Question

这个问题可能很模糊，但我无法想出一个体面的简洁标题。

我的数据中包含id，date，amountA和AmtB作为我的变量。任务是选择彼此在10天内的日期，然后查看他们的amountA是否在20％以内，如果他们是，则选择具有最高amountB的日期。我已经习惯了这段代码来实现这个目标

id     date     amountA   amountB  
1    1/15/2014   1000     79  
1    1/16/2014   1100     81  
1    1/30/2014   700      50  
1    2/05/2014   710      80   
1    2/25/2014   720      50

这就是我需要的

id     date     amountA   amountB  
1    1/16/2014   1100     81  
1    1/30/2014   700      50    
1    2/25/2014   720      50

我编写了这段代码，但是这段代码的问题是它不是自动的，必须根据具体情况来完成。我需要一种方法来循环它以便它自动输出结果。我不是专业的循环因此我被困住了。非常感谢任何帮助

data test2;
set test1;
diff_days=abs(intck('days',first_dt,date));
if diff_days<=10 then flag=1;
else if diff_days>10 then flag=0;
run; 

data test3 rem_test3;
set test2;
if flag=1 then output test3;
else output rem_test3;
run;

proc sort data=test3;
by id amountA;
run;

data all_within;
set test3;
by id amountA;
amtA_lag=lag1(amountA);
if first.id then
  do;
      counter=1;
           flag1=1;
  end;
if first.id=0 then
 do;
counter+1;
      diff=abs(amountA-amtA_lag);
      if diff<(10/100*amountA) then flag1+1;
      else flag1=0;
 end;    
if last.stay and flag1=counter then output all_within;
run;

Answer 1

如果我正确理解了这个问题，你想把所有记录组合在一起（不超过10天）和（amt A w / 20％）？

循环不是你的问题 - 不需要明确编码的循环（至少，我想到的方式）。 SAS为您执行数据步骤循环。

您想要做的是：

识别群组。组是您想要的连续记录，其中包括折叠到一行。我不清楚amountA在这里的表现如何 - 整个群体是否需要小于10％的最大差异，或者记录到下一个记录差异＆lt; 10％，或者（当前最高的amtB组）＆lt; 10％ - 但您可以轻松识别所有这些规则。使用RETAINed变量来跟踪先前的金额A，上一个日期，最高金额B，与最高金额B相关联的日期，金额A与最高金额B相关联。
当您找到不适合当前组的记录时，请输出包含前一组值的记录。

你不应该为此做两个步骤，尽管如果你想更容易看到它，你可以这样做 - 这可能有助于调试你的规则。设置它以便您有一个GroupNum变量，您可以将其保留，并在您看到导致新组启动的记录时增加该变量。

Answer 2

这是我认为应该有效的方法。基本方法是：

查找所有足够接近的观察对
自己加入对以获取所有连接的ID
减少群组
加入原始数据并获得所需的值

data have;
    input
        id
        date :mmddyy10.
        amountA
        amountB;
    format date mmddyy10.;
    datalines;
1 1/15/2014 1000 79
2 1/16/2014 1100 81
3 1/30/2014 700  50
4 2/05/2014 710  80
5 2/25/2014 720  50
;
run;

/* Count the observations */
%let dsid = %sysfunc(open(have));
%let nobs = %sysfunc(attrn(&dsid., nobs));
%let rc = %sysfunc(close(&dsid.));

/* Output any connected pairs */
data map;
    array vals[3, &nobs.] _temporary_;
    set have;
    /* Put all the values in an array for comparison */
    vals[1, _N_] = id;
    vals[2, _N_] = date;
    vals[3, _N_] = amountA;
    /* Output all pairs of ids which form an acceptable pair */
    do i = 1 to _N_;
        if 
            abs(vals[2, i] - date) < 10 and
            abs((vals[3, i] - amountA) / amountA) < 0.2
        then do;
            id2 = vals[1, i];
            output;
        end;
    end;
    keep id id2;
run;
proc sql;
    /* Reduce the connections into groups */
    create table groups as
    select 
        a.id, 
        min(min(a.id, a.id2, b.id)) as group
    from map as a
    left join map as b
        on a.id = b.id2
    group by a.id;
    /* Get the final output */
    create table lookup (where = (amountB = maxB)) as
    select 
        have.*, 
        groups.group,
        max(have.amountB) as maxB
    from have 
    left join groups
        on have.id = groups.id
    group by groups.group;
quit;

该代码适用于示例数据。但是，对于更复杂的数据，组减少是不够的。幸运的是，可以找到找到给定一组边的所有子图的方法here，here，here或here (using SAS/OR)。

Answer 3

我无法搞清楚规则......但是这里有一些代码可以根据我认为你想要的标准检查每条记录。

Data HAVE;
 input id     date :mmddyy10.    amountA   amountB  ;
 format date mmddyy10.;
 datalines;
1    1/15/2014   1000     79  
1    1/16/2014   1100     81  
1    1/30/2014   700      50  
1    2/05/2014   710      80   
1    2/25/2014   720      50   
;

Proc Sort data=HAVE;
 by id date;
Run;

Data WANT(drop=Prev_:);
 Set HAVE;

 Prev_Date=lag(date);
 Prev_amounta=lag(amounta);
 Prev_amountb=lag(amountb);

 If not missing(prev_date);

 If date-prev_date<=10 then do;
  If (amounta-prev_amounta)/amounta<=.1 then;
   If amountb<prev_amountb then do;
    Date=prev_date;
    AmountA=prev_amounta;
    AmountB=prev_amountb;
   end;
  end;
 Else delete;
Run;

在SAS中创建一个循环，同时过滤2个变量

3 个答案: