我遇到了一项任务,我必须返回每月每个天的已签发保单的总数和总数,并将其与上一年进行比较。
表PolicyOrder具有以下字段:
PolicyOrderId-主键 CreatedAt(DATETIME) CalculatedPremium-政策费用或“保费” PolicyOrderStatusId-与问题无关,但仍-政策状态。
为解决这个问题,我想到了一个查询,该查询将内部联接自身表并通过根据创建日期的DAY进行分组来求和/计数。
SELECT
DATE(po1.CreatedAt) AS dayDate_2017,
SUM(po1.CalculatedPremium) AS premiumSum_2017,
COUNT(po1.PolicyOrderId) AS policyCount_2017,
po2.*
FROM
PolicyOrder po1
INNER JOIN (
SELECT
DATE(CreatedAt) AS dayDate_2018,
SUM(CalculatedPremium) AS premiumSum_2018,
COUNT(PolicyOrderId) AS policyCount_2018
FROM
PolicyOrder po2
WHERE
YEAR(CreatedAt) = 2018 AND
MONTH(CreatedAt) = 10 AND
PolicyOrderStatusId = 6
GROUP BY
DAY(CreatedAt)
) po2 ON (
DAY(po2.dayDate_2018) = DAY(po1.CreatedAt)
)
WHERE
YEAR(po1.CreatedAt) = 2017 AND
MONTH(po1.CreatedAt) = 10 AND
PolicyOrderStatusId = 6
GROUP BY
DAY(po1.CreatedAt)
上面的查询返回以下结果:
dayDate_2017 | premiumSum_2017 | policyCount_2017 | dayDate_2018 | premiumSum_2018 | policyCount_2018
2017-10-01 | 4699.36 | 98 | 2018-10-01 | 8524.21 | 144
2017-10-02 | 9114.55 | 168 | 2018-10-02 | 7942.25 | 140
2017-10-03 | 9512.43 | 178 | 2018-10-03 | 9399.61 | 161
2017-10-04 | 9291.77 | 155 | 2018-10-04 | 6922.83 | 137
2017-10-05 | 8063.27 | 155 | 2018-10-05 | 9278.58 | 178
2017-10-06 | 9743.40 | 184 | 2018-10-06 | 6139.38 | 136
...
2017-10-31 | ...
问题在于,现在我必须再添加两列,其中必须计算政策,并从UP UNTIL年初开始向每个返回的行添加金额。
Desired results:
dayDate_2017 | premiumSum_2017 | policyCount_2017 | sumFromYearBegining | countFromYearBegining
2017-10-01 | 4699.36 | 98 | 150000.34 | 5332
2017-10-02 | 9114.55 | 168 | 156230.55 | 5443
2017-10-03 | 9512.43 | 178 | 160232.44 | 5663
...
2017-10-31 | ...
WHERE:
sumFromYearBegining (150000.34) - SUM of premiumSum from 2017-01-01 until 2017-10-01 (excluding)
countFromYearBegining (5332) - COUNT of policies from 2017-01-01 until 2017-10-01 (excluding)
sumFromYearBegining (1566239.55) - SUM of premiumSum from 2017-01-01 until 2017-10-02 (excluding)
countFromYearBegining (5443) - COUNT of policies from 2017-01-01 until 2017-10-02 (excluding)
sumFromYearBegining (160232.44) - SUM of premiumSum from 2017-01-01 until 2017-10-02 (excluding)
countFromYearBegining (5663) - COUNT of policies from 2017-01-01 until 2017-10-02 (excluding)
我尝试了内部连接同一表COUNTed和SUMed的操作失败,因为我无法指定需要计数和求和的范围,我尝试了LEFT联接然后进行计数,这失败了,因为直到每个结果都被计数行结果,但直到最后一个结果,等等
DB Fiddle:https://www.db-fiddle.com/f/ckM8HyTD6NjLbK41Mq1gct/5
您对SQL忍者的任何帮助都深表感谢。
答案 0 :(得分:2)
在没有窗口函数可用的情况下,我们可以使用User-defined variables来计算总和/计数。
我们首先需要确定2017年每天的总数和计数(即使您只需要特定月份的行)。因为,为了计算三月月份的天数的滚动总和,我们还需要一月和二月月份的总和/计数值。一种优化的可能性是,我们可以将计算从第一个月限制到只需要一个月。
请注意,ORDER BY daydate_2017
是必需的,以便能够正确计算滚动总和。默认情况下,数据是无序的。如果不定义顺序,我们不能保证Sum是正确的。
此外,我们需要两个级别的子选择查询。第一级用于计算滚动总和值。第二级仅用于将结果限制为2月。由于WHERE
在SELECT
之前执行;我们不能在第一级本身中将结果限制为2月。
如果您在2018年也需要类似的总和;可以在其他子选择查询集中实现类似的查询逻辑。
SELECT dt2_2017.*, dt_2018.*
FROM
(
SELECT dt_2017.*,
@totsum := @totsum + dt_2017.premiumsum_2017 AS sumFromYearBegining_2017,
@totcount := @totcount + dt_2017.policycount_2017 AS countFromYearBeginning_2017
FROM (SELECT Date(po1.createdat) AS dayDate_2017,
Sum(po1.calculatedpremium) AS premiumSum_2017,
Count(po1.policyorderid) AS policyCount_2017
FROM PolicyOrder AS po1
WHERE po1.policyorderstatusid = 6 AND
YEAR(po1.createdat) = 2017 AND
MONTH(po1.createdat) <= 2 -- calculate upto February for 2017
GROUP BY daydate_2017
ORDER BY daydate_2017) AS dt_2017
CROSS JOIN (SELECT @totsum := 0, @totcount := 0) AS user_init_vars
) AS dt2_2017
INNER JOIN (
SELECT
DATE(po2.CreatedAt) AS dayDate_2018,
SUM(po2.CalculatedPremium) AS premiumSum_2018,
COUNT(po2.PolicyOrderId) AS policyCount_2018
FROM
PolicyOrder po2
WHERE
YEAR(po2.CreatedAt) = 2018 AND
MONTH(po2.CreatedAt) = 2 AND
po2.PolicyOrderStatusId = 6
GROUP BY
dayDate_2018
) dt_2018 ON DAY(dt_2018.dayDate_2018) = DAY(dt2_2017.dayDate_2017)
WHERE YEAR(dt2_2017.daydate_2017) = 2017 AND
MONTH(dt2_2017.daydate_2017) = 2;
| dayDate_2017 | premiumSum_2017 | policyCount_2017 | sumFromYearBegining_2017 | countFromYearBeginning_2017 | dayDate_2018 | premiumSum_2018 | policyCount_2018 |
| ------------ | --------------- | ---------------- | ------------------------ | --------------------------- | ------------ | --------------- | ---------------- |
| 2017-02-01 | 4131.16 | 131 | 118346.77 | 3627 | 2018-02-01 | 8323.91 | 149 |
| 2017-02-02 | 2712.74 | 85 | 121059.51000000001 | 3712 | 2018-02-02 | 9469.33 | 153 |
| 2017-02-03 | 3888.59 | 111 | 124948.1 | 3823 | 2018-02-03 | 6409.21 | 97 |
| 2017-02-04 | 2447.99 | 74 | 127396.09000000001 | 3897 | 2018-02-04 | 5693.69 | 120 |
| 2017-02-05 | 1437.5 | 45 | 128833.59000000001 | 3942 | 2018-02-05 | 8574.97 | 129 |
| 2017-02-06 | 4254.48 | 127 | 133088.07 | 4069 | 2018-02-06 | 8277.51 | 133 |
| 2017-02-07 | 4746.49 | 136 | 137834.56 | 4205 | 2018-02-07 | 9853.75 | 173 |
| 2017-02-08 | 3898.05 | 125 | 141732.61 | 4330 | 2018-02-08 | 9116.33 | 144 |
| 2017-02-09 | 8306.86 | 286 | 150039.46999999997 | 4616 | 2018-02-09 | 8818.32 | 166 |
| 2017-02-10 | 6740.99 | 204 | 156780.45999999996 | 4820 | 2018-02-10 | 7880.17 | 134 |
| 2017-02-11 | 4290.38 | 133 | 161070.83999999997 | 4953 | 2018-02-11 | 8394.15 | 180 |
| 2017-02-12 | 3687.58 | 122 | 164758.41999999995 | 5075 | 2018-02-12 | 10378.29 | 171 |
| 2017-02-13 | 4939.31 | 159 | 169697.72999999995 | 5234 | 2018-02-13 | 9383.15 | 160 |
答案 1 :(得分:1)
如果您想要一种避免在选择列表中使用@variables并且避免分析的方法(仅mysql 8支持它们),则可以使用半笛卡尔积:
WITH prevYr AS(
SELECT
YEAR(CreatedAt) AS year_prev,
MONTH(CreatedAt) AS month_prev,
DAY(CreatedAt) AS day_prev,
SUM(CalculatedPremium) AS premiumSum_prev,
COUNT(PolicyOrderId) AS policyCount_prev
FROM
PolicyOrder
WHERE
CreatedAt BETWEEN '2017-02-01' AND '2017-02-28' AND
PolicyOrderStatusId = 6
GROUP BY
YEAR(CreatedAt), MONTH(CreatedAt), DAY(CreatedAt)
),
currYr AS (
SELECT
YEAR(CreatedAt) AS year_curr,
MONTH(CreatedAt) AS month_curr,
DAY(CreatedAt) AS day_curr,
SUM(CalculatedPremium) AS premiumSum_curr,
COUNT(PolicyOrderId) AS policyCount_curr
FROM
PolicyOrder
WHERE
CreatedAt BETWEEN '2018-02-01' AND '2018-02-28' AND
PolicyOrderStatusId = 6
GROUP BY
YEAR(CreatedAt), MONTH(CreatedAt), DAY(CreatedAt)
)
SELECT
*
FROM
prevYr
INNER JOIN
currYr
ON
currYr.day_curr = prevYr.day_prev
INNER JOIN
(
SELECT
main.day_prev AS dayRolling_prev,
SUM(pre.premiumSum_prev) AS premiumSumRolling_prev,
SUM(pre.policyCount_prev) AS policyCountRolling_prev
FROM
prevYr main LEFT OUTER JOIN prevYr pre ON pre.day_prev < main.day_prev
GROUP BY
main.day_prev
) rollingPrev
ON
currYr.day_curr = rollingPrev.dayRolling_prev
ORDER BY 1,2,3
我们将2017年和2018年的数据汇总为两个CTE,因为这会使以后的工作变得更加整洁和整洁,尤其是对于这种滚动计数。您可能很容易遵循CTE的逻辑,因为它从您的查询中或多或少地得到了直接-我只删除了DATE列,而选择了年/月/日三元组,因为它使其他内容更清晰(合并)并且可以重新组合如有需要,请确定日期。我还交换了WHERE子句以使用date BETWEEN x AND y
,因为这将利用列的索引,而使用YEAR(date) = x AND MONTH(date) = y
可能不会
滚动计数通过我称为半笛卡尔的东西起作用。它实际上是笛卡尔积。导致一个表中的行相乘并在输出中重复表示的任何数据库联接都是笛卡尔积。在这种情况下,它不是完整的产品(每行与其他行交叉)使用小于,因此每行仅与行的子集交叉。随着日期的增加,更多行与谓词匹配,因为第30个日期有29行小于该行。
因此会导致以下数据模式:
maindate predate maincount precount
2017-02-01 NULL 10 NULL
2017-02-02 2017-02-01 20 10
2017-02-03 2017-02-01 30 10
2017-02-03 2017-02-02 30 20
2017-02-04 2017-02-01 40 10
2017-02-04 2017-02-02 40 20
2017-02-04 2017-02-03 40 30
您可以看到,对于任何给定的主日期,它重复N-1次,因为比满足连接条件predate < maindate
的日期低N-1个日期
如果我们将主日期进行分组并加总与每个前置日期相关的计数,我们将得出该主日期所有前置计数的滚动总和(因此,在该月的第4天,它是SUM(前置计数日期1-3rd,即10 + 20 + 30 =60。在第5天,我们将第1至4天的计数相加。在第6天,我们将第1至5天相加,等等)