我有这样的数据集:
CustomerID AccountManager TransactionID Transaction_Time
1111111111 FA001 TR2016001 08SEP16:11:19:25
1111111111 FA001 TR2016002 26OCT16:08:22:49
1111111111 FA002 TR2016003 04NOV16:08:05:36
1111111111 FA003 TR2016004 04NOV16:17:15:52
1111111111 FA004 TR2016005 25NOV16:13:04:16
1231231234 FA005 TR2016006 25AUG15:08:03:29
1231231234 FA005 TR2016007 16SEP15:08:24:24
1231231234 FA008 TR2016008 18SEP15:14:42:29
CustomerID代表每个客户,每个客户可以有多个交易。每个客户经理也可以处理多个交易。但是,transactionID在此表中是唯一的。
现在我想为每个客户计算,当转换发生时,如果我回到过去90天,有多少不同的客户经理参与,以及发生了多少交易。我正在寻找的结果是这样的:
CustomerID Manager TransacID Transaction_Time CountTransac CountManager
1111111111 FA001 TR2016001 08SEP16:11:19:25 1 1
1111111111 FA001 TR2016002 26OCT16:08:22:49 2 1
1111111111 FA002 TR2016003 04NOV16:08:05:36 3 2
1111111111 FA003 TR2016004 04NOV16:17:15:52 4 3
1111111111 FA004 TR2016005 25NOV16:13:04:16 5 4
1231231234 FA005 TR2016006 25AUG15:08:03:29 1 1
1231231234 FA005 TR2016007 16SEP15:08:24:24 2 1
1231231234 FA008 TR2016008 18SEP15:14:42:29 3 2
现在使用以下代码,我弄清楚如何计算事务计数,但我不知道如何计算不同的管理器计数。如果有人可以帮助我,我们将非常感激。非常感谢。
DATA want;
SET transaction;
COUNT=1;
DO point=_n_-1 TO 1 BY -1;
SET want(KEEP=CustomerID Transaction_Time COUNT POINT=point
RENAME=(CustomerID =SAME_ID Transaction_Time =OTHER_TIME COUNT=OTHER_COUNT));
IF CustomerID NE SAME_ID
OR INTCK ("DAY", DATEPART(OTHER_TIME), DATEPART(Transaction_Time )) > 90
THEN LEAVE;
COUNT + OTHER_COUNT;
END;
DROP SAME_ID OTHER_TIME OTHER_COUNT;
RENAME COUNT=COUNT_TRANSAC;
RUN;
答案 0 :(得分:3)
您的代码根本不起作用,但我知道您想要做什么。这是有效的。我注释掉了WHERE
语句,因此您可以看到它产生了您要求的结果。如果您真的只想要过去90天,则需要WHERE
声明。
* Always a good idea to sort first unless you are CERTAIN that
* your values are in the order you want.;
proc sort data=have;
by customerid AccountManager transactionid;
run;
DATA want;
SET have;
* Uncomment the WHERE statement to activate the 90-day time frame.;
* where today()-datepart(transaction_time)<=90;
by customerid AccountManager transactionid;
if first.customerid
then do;
counttransac=0;
countmanager=0;
end;
if first.AccountManager
then countmanager+1;
counttransac+1;
RUN;
利用SAS的BY
声明以及first.
和last.
变量修饰符,您可以在每次看到新的客户ID和经理ID时重置计数器。
[编辑]好的,那要困难得多。这是在每次交易之前回顾历史的代码。我明白为什么你使用两个SET
语句,因为你必须将数据集加入到自身中。可能你可以使用PROC SQL
执行此操作,但我没有时间查看它。如果这对您有用,请告诉我。
* Sort each customer's and manager's transactions;
proc sort data=transaction;
by customerid accountmanager;
run;
DATA want;
SET transaction nobs=pmax;
by customerid;
length lastmgr $ 100;
retain pstart; * Starting row for each customer;
* Save starting row for each customer;
if first.customerid
then pstart=_n_;
* Initialize current account manager and counters for
* managers and transactions. The current transaction always
* counts as one transaction and one manager.
* Save the beginning of the 90-day period to avoid
* recalculating it each time.;
lastmgr=accountmanager;
mgrct=1;
tranct=1;
ninetyday=datepart(transaction_time)-90;
* Set the starting row to search for each transaction;
p=pstart;
* Loop through all rows for the customer and only count
* those that occur before the current transaction and
* after the 90-day period before it.;
* Note that the transactions are not necessarily sorted
* in chronological order but rather in groups by customer
* and manager, so we have to look through all of the
* customer's transactions each time.;
* DO UNTIL(0) means loop forever, so be careful that
* there is always a LEAVE statement executed.;
do until(0);
* p > pmax means the end of the transaction list, so stop.;
if p > pmax
then leave;
set transaction (keep=customerid accountmanager transaction_time
rename=(customerid=cust2 accountmanager=mgr2 transaction_time=tt2))
point=p;
* When customer ID changes, we are done with the loop.;
if cust2 ~= customerid
then leave;
else do;
* To be counted, the transaction needs to be within the
* 90-day period. Using "<" for the transaction time pre-
* vents counting the current transaction twice.;
if datepart(tt2) >= ninetyday and tt2 < transaction_time
then do;
tranct=tranct+1;
if mgr2 ~= lastmgr
then do;
mgrct=mgrct+1;
lastmgr=mgr2;
end;
end;
end;
* Look at the next transaction.;
p=p+1;
end;
keep CustomerID AccountManager TransactionID Transaction_Time tranct mgrct;
RUN;
[编辑]这是一个有效的PROC SQL
方法。它是by Tom in answer to my question here关于如何创建一个优雅的查询来完成任务:
proc sql noprint ;
create table want as
select a.*
, count(distinct b.accountmanager) as mgrct
, count(*) as tranct
from transaction a
left join transaction b
on a.customerid = b.customerid
and b.transaction_time <= a.transaction_time
and datepart(a.transaction_time)-datepart(b.transaction_time)
between 0 and 90
group by 1,2,3,4
;
quit;