我有一些样本数据如下,并且想要计算连续输赢的数量。
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
PROC PRINT; RUN;
proc sort data=have;
by username betdate;
run;
DM "log; clear;";
data want;
set have;
by username dateOnly betdate;
retain calendarTime eventTime cumulativeDailyProfit profitableFlag;
if first.username then calendarTime = 0;
if first.dateOnly then calendarTime + 1;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
if first.username then cumulativeDailyProfit = 0;
if first.dateOnly then cumulativeDailyProfit = 0;
if first.betdate then cumulativeDailyProfit + stake;
if winnings > 0 then winner = 1;
if winnings <= 0 then winner = 0;
PROC PRINT; RUN;
例如,前四个投注四个玩家1是获胜者,因此此列中的前四行应显示1,2,3,4(此时连续四次获胜)。第五个是失败者,所以应该显示-1,然后是1,2。以下三行(对于玩家3,应该显示-1,-2,-3,因为客户连续有三个赌注。如何在数据步骤中计算此列的值?我怎么能有一个连续投注数量最多的列(迄今为止)以及客户在每一行中最近输掉的投注数量?
感谢您的帮助。
答案 0 :(得分:3)
要像这样执行总计,您可以将BY
与NOTSORTED
一起使用,并仍然使用first.<var>
功能。例如:
data have;
input winlose $;
datalines;
win
win
win
win
lose
lose
win
lose
win
win
lose
;;;;
run;
data want;
set have;
by winlose notsorted;
if first.winlose and winlose='win' then counter=1;
else if first.winlose then counter=-1;
else if winlose='win' then counter+1;
else counter+(-1);
run;
每次'win'更改为'lost'或反向,它会将first.winlose
变量重置为1.
完成此操作后,您可以使用双DoW循环追加最大值,也可以更轻松地在数据集中获取此值,然后通过第二个datastep(或proc sql)添加它以附加所需的值变量