我有一个看起来像这样的数据集
Enc_ID | date | P1 | P3 |
--------------------------------
1 | 11/1/17 | 1 | NULL |
2 | 11/1/17 | NULL | 1 |
3 | 11/1/17 | 1 | NULL |
4 | 11/2/17 | 1 | NULL |
5 | 11/2/17 | NULL | 1 |
即每一行都是遭遇,并且在任何一天都可能(总是)有多次遭遇。
我需要计算每天的P1和P3的运行总和。所以:
date | sum_p1 | sum_p3 |
---------------------------
11/1/17 | 2 | 1 |
11/2/17 | 3 | 2 |
然后我需要对每个日期的每个总和执行此计算,如下所示:
(sum_p1 - sum_p3) / sum_p1
所以我最终需要一个能够显示
的表格 date | dropout rate
----------------------
11/1/17 | 50%
11/2/17 | 33%
我试图在Superset中执行此操作,因此我无法使用任何JOINS
。我尝试过某种嵌套GROUP BY
,但MySQL (5.7.20)
并不喜欢它。
这是我当前的查询,但它只返回每个日期的p1和p3的SUM
,而不是每个日期的AS OF。
SELECT encounter_date AS __timestamp,
(SUM(p1) - SUM(p3)) / SUM(p1) AS pd
FROM encounter
WHERE encounter_date >= '2016-11-06 00:00:00.000000'
AND encounter_date <= '2017-11-06 17:00:29.000000'
GROUP BY encounter_date
ORDER BY encounter_date ASC
LIMIT 50000
OFFSET 0
答案 0 :(得分:1)
在MySQL中,使用变量或子查询。在这种情况下,变量更容易:
select encounter_date, p1, p3,
(@p1 := @p1 + p1) as running_p1,
(@p3 := @p3 + p3) as running_p3
from (select encounter_date, count(p1) as p1, count(p3) as p3
from encounter e
where encounter_date >= '2016-11-06 00:00:00.000000' and encounter_date <= '2017-11-06 17:00:29.000000'
group by encounter_date
order by encounter_date
) e cross join
(select @p1 := 0, @p3 := 0) params;
对于最终计算,请将其用作子查询来进行最终计算。
答案 1 :(得分:0)
我会按顺序执行2个连接的查询:
SELECT encounter_date as `date`, (1-`sum_p3`/`sum_p1`) as `dropout rate` FROM (
SELECT encounter_date, SUM(p1) as `sum_p1`, SUM(p3) as `sum_p3`
FROM encounter
WHERE encounter_date >= '2016-11-06 00:00:00.000000'
AND encounter_date <= '2017-11-06 17:00:29.000000'
GROUP BY encounter_date
ORDER BY encounter_date ASC
) as `grouped`