我有以下示例数据:
data have;
input username $ amount betdate : datetime.;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 90 12NOV2008:12:04:01
player1 -100 04NOV2008:09:03:44
player2 120 07NOV2008:14:03:33
player1 -50 05NOV2008:09:00:00
player1 -30 05NOV2008:09:05:00
player1 20 05NOV2008:09:00:05
player2 10 09NOV2008:10:05:10
player2 -35 15NOV2008:15:05:33
run;
PROC PRINT data=have; RUN;
proc sort data=have;
by username betdate;
run;
data want;
set have;
by username dateOnly betdate;
retain calendarTime eventTime cumulativeDailyProfit profitableFlag totalDailyProfit;
if first.username then calendarTime = 0;
if first.dateOnly then calendarTime + 1;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
if first.username then cumulativeDailyProfit= 0;
if first.dateOnly then cumulativeDailyProfit= 0;
if first.betdate then cumulativeDailyProfit+ amount;
if first.dateOnly then totalDailyProfit = 0;
if first.betdate then totalDailyProfit + amount;
PROC PRINT data=want; RUN;
输出'cumulativeDailyProfit'中的最后一列正是我想要的:一个递增值,它增加了'amount'字段的值。但是,我不希望字段'totalDailyProfit'发生同样的情况,因为我希望这显示当天结束时的利润,即每个客户的cumulativeDailyProfit的最后一个值。
例如,上面的八列理想情况下会显示以下内容:-100,-60,-60,-60,90,120,10,-35。然后,如果该值大于0,那么我将设置'profitableFlag'布尔值,用于与当天和该客户相关的行。
这是否可以在数据步骤中实际完成?我希望能够运行以下查询(在if子句的情况下使用右侧标志)来获得平均值,获胜天数的平均值和失败天数的平均值。
proc sql;
select calendarTime,
mean(amount) as meanStake,
mean(case when profitableFlag = 1 then amount else . End) as meanLosingDayStake,
mean(case when profitableFlag = 1 then amount else . End) as meanWinningDayStake
from want
group by 1;
quit;
答案 0 :(得分:1)
尝试此查询:
proc sql;
select calendarTime,
avg(amount) as meanStake,
avg(case when profitableFlag = 1
then amount else 0 End) as meanLosingDayStake,
avg(case when profitableFlag = 1
then amount else 0 End) as meanWinningDayStake
from want
group by calendarTime;
quit;