我正在尝试计算自2020年1月1日以来的累计收入。我具有以下模式的用户级收入数据
create table revenue
(
game_id varchar(255),
user_id varchar(255),
amount int,
activity_date varchar(255)
);
insert into revenue
(game_id, user_id, amount, activity_date)
values
('Racing', 'ABC123', 5, '2020-01-01'),
('Racing', 'ABC123', 1, '2020-01-04'),
('Racing', 'CDE123', 1, '2020-01-04'),
('DH', 'CDE123', 100, '2020-01-03'),
('DH', 'CDE456', 10, '2020-01-02'),
('DH', 'CDE789', 5, '2020-01-02'),
('DH', 'CDE456', 1, '2020-01-03'),
('DH', 'CDE456', 1, '2020-01-03');
预期产量
Game Age Cum_rev Total_unique_payers_per_game
Racing 0 5 2
Racing 1 5 2
Racing 2 5 2
Racing 3 7 2
DH 0 0 3
DH 1 15 3
DH 2 117 3
DH 3 117 3
年龄是根据交易日期与2020-01-01之间的差额计算的。 我正在使用以下逻辑
SELECT game_id, DATEDIFF(activity_date ,'2020-01-01') as Age,count(user_id) as Total_unique_payers
from REVENUE
SQL fiddle 如何计算累计收入?
答案 0 :(得分:1)
对于以下情况,您需要一个支持INNER JOIN
子句的MySQL版本(MySQL 8+)-我在下面使用了MariaDB 10.4(我尝试时MySQL 8在该站点上不起作用)
over()
✓ ✓
create table revenue ( game_id varchar(255), user_id varchar(255), amount int, activity_date varchar(255) ); insert into revenue (game_id, user_id, amount, activity_date) values ('Racing', 'ABC123', 5, '2020-01-01'), ('Racing', 'ABC123', 1, '2020-01-04'), ('Racing', 'CDE123', 1, '2020-01-04'), ('DH', 'CDE123', 100, '2020-01-03'), ('DH', 'CDE456', 10, '2020-01-02'), ('DH', 'CDE789', 5, '2020-01-02'), ('DH', 'CDE456', 1, '2020-01-03'), ('DH', 'CDE456', 1, '2020-01-03');
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers :------ | :------ | :------------ | -----: | ----------: | ------------------: Racing | ABC123 | 2020-01-01 | 5 | 5 | 4 DH | CDE456 | 2020-01-02 | 10 | 15 | 4 DH | CDE789 | 2020-01-02 | 5 | 20 | 4 DH | CDE123 | 2020-01-03 | 100 | 120 | 4 DH | CDE456 | 2020-01-03 | 1 | 122 | 4 DH | CDE456 | 2020-01-03 | 1 | 122 | 4 Racing | ABC123 | 2020-01-04 | 1 | 123 | 4 Racing | CDE123 | 2020-01-04 | 1 | 124 | 4
db <>提琴here
更改over子句中的计算顺序会影响运行总和的计算方式:例如
SELECT game_id , user_id , activity_date , amount , sum(amount) over(order by activity_date, user_id) as running_sum , (select count(distinct user_id) from revenue) as Total_unique_payers from revenue order by activity_date , user_id
game_id | user_id | activity_date | amount | running_sum | Total_unique_payers :------ | :------ | :------------ | -----: | ----------: | ------------------: Racing | ABC123 | 2020-01-01 | 5 | 5 | 4 Racing | ABC123 | 2020-01-04 | 1 | 6 | 4 Racing | CDE123 | 2020-01-04 | 1 | 7 | 4 DH | CDE456 | 2020-01-02 | 10 | 17 | 4 DH | CDE789 | 2020-01-02 | 5 | 22 | 4 DH | CDE123 | 2020-01-03 | 100 | 122 | 4 DH | CDE456 | 2020-01-03 | 1 | 124 | 4 DH | CDE456 | 2020-01-03 | 1 | 124 | 4
db <>提琴here
答案 1 :(得分:1)
使用MySQL 5.7的唯一方法是使用它的变量系统,尽管它起作用了。它模拟了@Used_By_Already在其answer
上使用的窗口函数由于您提到要关注差距,因此需要首先创建日期表,该操作很容易做到:
create table dates_view (
date_day date
);
insert into dates_view
select date_add( '2019-12-31', INTERVAL @rownum:=@rownum+1 day ) as date_day
from (
select 0 union select 1 union select 2 union select 3
union select 4 union select 5 union select 6
union select 7 union select 8 union select 9
) a, (
select 0 union select 1 union select 2 union select 3
union select 4 union select 5 union select 6
union select 7 union select 8 union select 9
) b, (select @rownum:=0) r;
-- Note: each set of select union above will multiply the number
-- of days by 10, so if you need more days in your table just add more
-- set as above "a" or "b" sets
在拥有日期表之后,您必须将其与当前的revenue
表交叉连接,因为您希望玩家数量与累积的amount
独立,因此您需要独立地对其进行计算在子查询中。
您还需要计算max(activity_date)
表的revenue
,以便将结果限制到表中。
因此,下面的查询将仅根据您当前的样本数据来执行此操作:
set @_sum:=0; -- Note: this two lines depends on the client
set @_currGame:=''; -- you are using. Some accumulate variable per session
-- some doesn't, below site, for instance does
select a.game_id,
a.age,
case when @_currGame = game_id
then @_sum:=coalesce(samount,0) + @_sum
else @_sum:=coalesce(samount,0) end as Cum_rev,
a.Total_unique_payers_per_game,
@_currGame := game_id varComputeCurrGame
from
(
select players.game_id,
rev.samount,
datediff(dv.date_day, '2020-01-01') age,
players.noPlayers Total_unique_payers_per_game
from (select @_sum:=0) am,
dates_view dv
cross join (select max(activity_date) maxDate from revenue) md
on dv.date_day <= md.maxDate
cross join (select game_id, count(distinct user_id) noPlayers
from revenue group by game_id) players
left join (select game_id, activity_date, sum(amount) samount
from revenue group by game_id, activity_date) rev
on players.game_id = rev.game_id
and dv.date_day = rev.activity_date
) a,
(select @_sum:=0) s,
(select @_currGame='') x
order by a.game_id desc, a.age;
这将导致:
game_id age Cum_rev Total_unique_payers_per_game varComputeCurrGame
Racing 0 5 2 Racing
Racing 1 5 2 Racing
Racing 2 5 2 Racing
Racing 3 7 2 Racing
DH 0 0 3 DH
DH 1 15 3 DH
DH 2 117 3 DH
DH 3 117 3 DH
看到它在这里工作(您需要运行它):https://www.db-fiddle.com/f/qifZ6hmpvcSZYwhLDv613d/2
这是MySQL 8.x的版本,它支持窗口功能:
select distinct agetable.game_id,
agetable.age,
sum(coalesce(r1.amount,0))
over (partition by agetable.game_id
order by agetable.game_id, agetable.age) as sm,
agetable.ttplayers
from
(
select r.game_id, dv.date_day, datediff(dv.date_day, '2020-01-01') age, p.ttplayers
from dates_view dv
cross join (select distinct game_id, activity_date from revenue) r
on dv.date_day <= (select max(activity_date) from revenue)
left join (select game_id, count(distinct user_id) ttplayers from revenue group by game_id) p
on r.game_id = p.game_id
group by r.game_id desc, dv.date_day, age, p.ttplayers
) agetable
left join revenue r1
on agetable.date_day = r1.activity_date
and r1.game_id = agetable.game_id
order by agetable.game_id desc, agetable.age