我想实现以下目标: 对于表A中的每个客户(custID),获取trade_date并找到小于或等于trade_date的最大创建日期(基于客户作为匹配字段)。获取特定最大创建日期记录的评级,然后重新加入表A(这是表C所需要的)
Trade_Table A:
CustID | Trade_date | Trade_ID
12345 | 30/7/2018 | 4axd
12345 | 30/7/2018 |
12345 | 31/7/2018 | 5FETF
12345 | 05/9/2018 | fst43d
12366 | 01/8/2018 | g3fgg
12377 | 01/9/2018 | dfd45
风险评分表B:
CustID | Create_date | Rating
12345 | 29/7/2018 | 2
12345 | 30/7/2018 | 3
12345 | 31/7/2018 | 4
12345 | 01/9/2018 | 1
12366 | 30/7/2018 | 1
12377 | 31/9/2018 | 5
最终表C:
CustID | Trade_date | Trade_ID | Rating
12345 | 30/7/2018 | 4axd | 3
12345 | 30/7/2018 | | 3
12345 | 31/7/2018 | 5FETF | 4
12345 | 05/9/2018 | fst43d | 1
12366 | 01/8/2018 | g3fgg | 1
12377 | 01/9/2018 | dfd45 |
我尝试了很多方法,这是其中一种,但是我得到了超过1行的返回错误。这比我想的要复杂。
proc sql;
create Final_Table C as select
A.*,
(select max(B.Create_date) FROM Risk_Rating_Table B where A.trade_date >= B.Create_date
group by B.CustID
) as Rating
from
Trade_Table A
;
QUIT;
答案 0 :(得分:0)
尝试以下
SELECT Tmp.CustID ,Tmp.Trade_date,Tmp.Trade_ID,Tmp.Create_date,B.Rating
FROM ( SELECT A.CustID ,A.Trade_date, A.Trade_ID, MAX(B.Create_date) as Create_date
FROM TableA A
JOIN TableB B ON A.CustID = B.CustID
WHERE A.Trade_date >=B.Create_date
GROUP BY A.CustID ,A.Trade_date, A.Trade_ID
)Tmp
JOIN TableB B ON Tmp.CustID = B.CustID AND Tmp.Create_date = B.Create_date
答案 1 :(得分:0)
正如SQL Guru @Gordon Linoff所说的那样,在PROC SQL中很难做到这一点,我认为最好使用一些数据步骤或哈希技术。我能够效仿您的结果,但不确定其效率如何。我已经转置了数据,并使用数组来避免多对多连接。
/* create test data*/
data Trade_Table;
infile datalines missover;
input CustID Trade_date:ddmmyy10. Trade_ID $;
format Trade_date:ddmmyy10.;
datalines;
12345 30/7/2018 4axd
12345 30/7/2018
12345 31/7/2018 5FETF
12345 05/9/2018 fst43d
12366 01/8/2018 g3fgg
12377 01/9/2018 dfd45
;
/* second test data */
data Risk_Rating_Table ;
input CustID Create_date:ddmmyy10. Rating;
format Create_date ddmmyy10.;
datalines;
12345 29/7/2018 2
12345 30/7/2018 3
12345 31/7/2018 4
12366 30/7/2018 1
12377 30/9/2018 5
;
/* tranpose the data so the date comes in rows and then it is easy to compare*/
proc transpose data = Risk_Rating_Table out =one(drop =_name_);
by custid ;
var create_date;
run;
/* merge the master table with tranposed table and may to have sort data*/
data have1;
merge Trade_Table one;
by custid;
run;
/* compare ge columns in a array*/
data have2;
set have1;
array col(*) col1-col3;
do i=1 to 3 while(col{i} le Trade_date);
new=col{i};
end;
format new ddmmyy10.;
/* try out more examples may for this part*/
if new = . then new= max(of col:);
drop col: i;
run;
/* get back required columns */
proc sql;
create table want as
select a.CustID ,Trade_date , Trade_ID , case
when a.custid = b.custid and trade_date ge create_date
then b.rating
else .
end as rating
from have2 a
left join
Risk_Rating_Table b
on a.custid = b.custid
and create_date =new;
答案 2 :(得分:0)
我设法按照下面的方法解决此问题-欢迎发表评论。
proc sql;
create table tmp_dcpbase as SELECT
A.custid ,A.Trade_date,
CASE
WHEN A.Trade_date >=B.CREATE_DATE is not null THEN 1
ELSE 0
END as CONSIDER ,
PUT(max(B.CREATE_DATE), datetime16. ) AS RPQCr8DTForm,
max(B.CREATE_DATE) as RPQCr8DT
FROM Trade_Table as a inner join RISK_RATING_Table as b
ON (a.custID = b.custID )
GROUP BY A.custID , A.trade_date,CONSIDER
having MAX(CONSIDER) = 1
;
quit;
proc sql;
create table target_C as select
D.custID , D.Trade_date,D.Trade_id, tmp2.RPQCr8DTForm , TMP2.Rating from
(
SELECT A.*, b.Rating
FROM
tmp_dcpbase A
INNER JOIN RISK_RATING_Table b ON (a.custID = b.custID AND a.RPQCr8DT = b.CREATE_DATE)
) TMP2 right join trade_table D ON (TMP2.custID = D.custID AND TMP2.trade_date = d.trade_date)
;
quit;