是否可以在下面的SAS 9.1示例中使用哈希对象合并以下两个表?主要问题似乎是创建Value
变量w Result数据集。问题是每笔付款都可以支付一次以上的费用,有时一次付款需要支付一次以上的费用,这种情况可能同时出现。问题是否有一些通用名称?
http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf
data TABLE1;
input ID_client ID_commodity Charge;
datalines;
1 111111111 100
1 222222222 200
2 333333333 300
2 444444444 400
2 555555555 500
;;;;
run;
data TABLE2;
input ID_client_hash ID_ofpayment paymentValue;
datalines;
1 11 50
1 12 50
1 13 100
1 14 50
1 15 50
2 21 500
2 22 200
2 23 100
2 24 200
2 25 200
;;;;
run;
data OUT;
input ID_client ID_commodity ID_ofpayment value;
datalines;
1 111111111 11 50
1 111111111 12 50
1 222222222 13 100
1 222222222 14 50
1 222222222 15 50
2 333333333 21 300
2 444444444 21 200
2 444444444 22 200
2 555555555 23 100
2 555555555 24 200
2 555555555 25 200
答案 0 :(得分:1)
这可能对你有用 - 我有9.2和9.2有一些重要的哈希改进,但我认为我表现得很好,只使用9.1中的那些。您可以尝试将其转换为SAS-L [SAS listserv],因为Paul Dorfman(即The Hash Guru)读到的仍然是我相信的。
我以为你想要剩下的剩饭和#39;张贴了。如果它不按您想要的方式工作,您可能需要处理该部分。这不是非常好的测试,它适用于您的示例数据集。我打电话错过了24和25的商品,因为他们没有使用它。
我非常确定迭代的方式比我做的更干净,但是由于9.2+是我使用的,我们有多数据可用,我总是使用它而不是哈希迭代器,所以我不知道更清洁的方法。
data have;
input ID_client ID_commodity Charge;
datalines;
1 111111111 100
1 222222222 200
2 333333333 300
2 444444444 400
2 555555555 50
;;;;
run;
data for_hash;
input ID_client_hash ID_ofpayment paymentValue;
datalines;
1 11 50
1 12 50
1 13 100
1 14 50
1 15 50
2 21 500
2 22 200
2 23 100
2 24 200
2 25 200
;;;;
run;
data want;
*Create hash and hash iterator - must use iterator since 9.1 does not allow multidata option;
if _n_ = 1 then do;
format id_client_hash paymentValue id_ofpayment BEST12.;
declare hash h(dataset:'for_hash' , ordered: 'a');
h.defineKey('ID_client_hash','id_ofpayment'); *note I put id_client_hash, renaming the id - want to be able to compare them;
h.defineData('id_client_hash','id_ofpayment','paymentValue');
call missing(id_ofpayment,paymentValue, id_client_hash);
h.defineDone();
declare hiter hi('h');
end;
do _t = 1 by 1 until (last.id_client);
set have;
by id_client;
*Iterate through the hash and find the first record with the same ID_client;
do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
rc = hi.next();
end;
*For the current charge record, iterate through the payment (hash) until all paid up.;
do while (charge gt 0 and rc eq 0 and ID_client=ID_client_hash);
if charge ge paymentValue then do; *If charge >= paymentvalue, use up the payment value;
value = paymentValue; *so whole paymentValue is value;
charge = charge - paymentValue; *charge is decremented by paymentValue;
output; *output row;
_id=ID_client_hash;
_pay=id_ofpayment;
rc = hi.next();
h.remove(key:_id,key:_pay); *remove payment row from hash now that it has been used up;
end;
else do; *this is if (remaining) charge is less than payment - we will not use all of the payment;
value = charge; *value is the remainder of the charge, ie, how much of payment was actually used;
paymentValue = paymentValue - charge; *paymentValue is the remainder of paymentValue;
charge= 0; *charge is zero now;
output; *output a row;
h.replace(); *replace paymentValue in the hash with the new value of paymentValue, minus charge;
end;
end; *end of iteration through hash - at this point, either charge = 0 or we have run out of payments with that ID;
if charge gt 0 then do;
value=-1*charge;
call missing(id_ofpayment);
output; *output a row for the charge, which is not paid;
end;
if last.id_client then do; *this is cleanup, checking to see if we have any leftover payments;
do while (rc=0); *iterate through the remaining hash;
do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
rc = hi.next();
end;
if rc=0 then do;
call missing(id_commodity); *to make it clear this is a leftover payment;
value=paymentValue; *update the value;
output; *output the payment;
_id=ID_client_hash;
_pay=id_ofpayment;
rc = hi.next();
if rc= 0 then h.remove(key:_id,key:_pay); *remove the payment just output;
end;
end;
end;
end;
keep id_client id_ofpayment id_commodity value;
run;
除此之外,这不是非常快 - 我做了很多迭代,这可能是浪费。如果你没有任何付款ID_client记录在收费记录中没有表示,那么它会相对更快 - 你所做的任何事情都会被跳过,所以最终可能会超级慢。
我不自信哈希是最好的解决方案,至少在9.2之前; keyed UPDATE可能更优越。 UPDATE几乎是针对事务性数据库结构进行的,这似乎很接近。