SAS中的哈希对象 - 是否可以使用哈希对象合并下面的两个表?

时间:2013-02-08 14:09:42

标签: sas

是否可以在下面的SAS 9.1示例中使用哈希对象合并以下两个表?主要问题似乎是创建Value变量w Result数据集。问题是每笔付款都可以支付一次以上的费用,有时一次付款需要支付一次以上的费用,这种情况可能同时出现。问题是否有一些通用名称? http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf

data TABLE1;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      500
;;;;
run;


data TABLE2;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data OUT;
input ID_client     ID_commodity    ID_ofpayment    value;
datalines;
1               111111111             11    50
1               111111111             12    50
1               222222222             13    100
1               222222222             14    50
1               222222222             15    50
2               333333333             21    300
2               444444444             21    200
2               444444444             22    200
2               555555555             23    100
2               555555555             24    200
2               555555555             25    200

1 个答案:

答案 0 :(得分:1)

这可能对你有用 - 我有9.2和9.2有一些重要的哈希改进,但我认为我表现得很好,只使用9.1中的那些。您可以尝试将其转换为SAS-L [SAS listserv],因为Paul Dorfman(即The Hash Guru)读到的仍然是我相信的。

我以为你想要剩下的剩饭和#39;张贴了。如果它不按您想要的方式工作,您可能需要处理该部分。这不是非常好的测试,它适用于您的示例数据集。我打电话错过了24和25的商品,因为他们没有使用它。

我非常确定迭代的方式比我做的更干净,但是由于9.2+是我使用的,我们有多数据可用,我总是使用它而不是哈希迭代器,所以我不知道更清洁的方法。

data have;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      50
;;;;
run;


data for_hash;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data want;
*Create hash and hash iterator - must use iterator since 9.1 does not allow multidata option;
if _n_ = 1 then do;
  format id_client_hash paymentValue id_ofpayment BEST12.;
  declare hash h(dataset:'for_hash' , ordered: 'a');
  h.defineKey('ID_client_hash','id_ofpayment'); *note I put id_client_hash, renaming the id - want to be able to compare them;
  h.defineData('id_client_hash','id_ofpayment','paymentValue');
  call missing(id_ofpayment,paymentValue, id_client_hash);
  h.defineDone();
  declare hiter hi('h');
end;

do _t = 1 by 1 until (last.id_client);
 set have;
 by id_client;

 *Iterate through the hash and find the first record with the same ID_client;
 do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
   rc = hi.next();
 end;

 *For the current charge record, iterate through the payment (hash) until all paid up.;
 do while (charge gt 0 and rc eq 0 and ID_client=ID_client_hash);
   if charge ge paymentValue then do; *If charge >= paymentvalue, use up the payment value;
     value = paymentValue; *so whole paymentValue is value;
     charge = charge - paymentValue; *charge is decremented by paymentValue;
     output; *output row;
     _id=ID_client_hash; 
     _pay=id_ofpayment;
     rc = hi.next();
    h.remove(key:_id,key:_pay); *remove payment row from hash now that it has been used up;
   end;
   else do; *this is if (remaining) charge is less than payment - we will not use all of the payment;
     value = charge; *value is the remainder of the charge, ie, how much of payment was actually used;
     paymentValue = paymentValue - charge; *paymentValue is the remainder of paymentValue;
     charge= 0; *charge is zero now;
     output; *output a row;
     h.replace(); *replace paymentValue in the hash with the new value of paymentValue, minus charge;
   end;
 end; *end of iteration through hash - at this point, either charge = 0 or we have run out of payments with that ID;
 if charge gt 0 then do;
   value=-1*charge;
   call missing(id_ofpayment);
   output; *output a row for the charge, which is not paid; 
 end;
 if last.id_client then do;  *this is cleanup, checking to see if we have any leftover payments;
   do while (rc=0); *iterate through the remaining hash;
     do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
       rc = hi.next();
     end;
     if rc=0 then do;
         call missing(id_commodity); *to make it clear this is a leftover payment;
         value=paymentValue; *update the value;
         output; *output the payment;
         _id=ID_client_hash;
         _pay=id_ofpayment;
         rc = hi.next();
         if rc= 0 then h.remove(key:_id,key:_pay); *remove the payment just output;
     end;    
   end;
 end;
end;
keep id_client id_ofpayment id_commodity value;
run;

除此之外,这不是非常快 - 我做了很多迭代,这可能是浪费。如果你没有任何付款ID_client记录在收费记录中没有表示,那么它会相对更快 - 你所做的任何事情都会被跳过,所以最终可能会超级慢。

我不自信哈希是最好的解决方案,至少在9.2之前; keyed UPDATE可能更优越。 UPDATE几乎是针对事务性数据库结构进行的,这似乎很接近。