SQL查询以SUM方式连接同一表,直到返回的每一行

时间:2018-11-13 06:54:03

标签: mysql sql

我遇到了一项任务,我必须返回每月每个天的已签发保单的总数和总数,并将其与上一年进行比较。

表PolicyOrder具有以下字段:

PolicyOrderId-主键 CreatedAt(DATETIME) CalculatedPremium-政策费用或“保费” PolicyOrderStatusId-与问题无关,但仍-政策状态。

为解决这个问题,我想到了一个查询,该查询将内部联接自身表并通过根据创建日期的DAY进行分组来求和/计数。

SELECT 
      DATE(po1.CreatedAt) AS dayDate_2017, 
      SUM(po1.CalculatedPremium) AS premiumSum_2017,
      COUNT(po1.PolicyOrderId) AS policyCount_2017,
      po2.*
FROM 
      PolicyOrder po1
INNER JOIN (
           SELECT 
                DATE(CreatedAt) AS dayDate_2018, 
                SUM(CalculatedPremium) AS premiumSum_2018, 
                COUNT(PolicyOrderId) AS policyCount_2018
           FROM 
                PolicyOrder po2
           WHERE
                YEAR(CreatedAt) = 2018 AND 
                MONTH(CreatedAt) = 10 AND
                PolicyOrderStatusId = 6 
           GROUP BY
                DAY(CreatedAt)
       ) po2 ON ( 
           DAY(po2.dayDate_2018) = DAY(po1.CreatedAt) 
       )
WHERE   
       YEAR(po1.CreatedAt) = 2017 AND 
       MONTH(po1.CreatedAt) = 10 AND 
       PolicyOrderStatusId = 6 
GROUP BY 
       DAY(po1.CreatedAt)

上面的查询返回以下结果:

dayDate_2017 | premiumSum_2017 | policyCount_2017 | dayDate_2018 | premiumSum_2018 | policyCount_2018
2017-10-01   | 4699.36         | 98               | 2018-10-01   | 8524.21         | 144
2017-10-02   | 9114.55         | 168              | 2018-10-02   | 7942.25         | 140
2017-10-03   | 9512.43         | 178              | 2018-10-03   | 9399.61         | 161
2017-10-04   | 9291.77         | 155              | 2018-10-04   | 6922.83         | 137
2017-10-05   | 8063.27         | 155              | 2018-10-05   | 9278.58         | 178
2017-10-06   | 9743.40         | 184              | 2018-10-06   | 6139.38         | 136
...
2017-10-31   | ...

问题在于,现在我必须再添加两列,其中必须计算政策,并从UP UNTIL年初开始向每个返回的行添加金额。

Desired results:
dayDate_2017 | premiumSum_2017 | policyCount_2017 | sumFromYearBegining | countFromYearBegining 
2017-10-01   | 4699.36         | 98               | 150000.34           | 5332   
2017-10-02   | 9114.55         | 168              | 156230.55           | 5443
2017-10-03   | 9512.43         | 178              | 160232.44           | 5663
    ...
2017-10-31   | ...


WHERE:

sumFromYearBegining (150000.34) - SUM of premiumSum from 2017-01-01 until 2017-10-01 (excluding)
countFromYearBegining (5332) - COUNT of policies from 2017-01-01 until 2017-10-01 (excluding)

sumFromYearBegining (1566239.55) - SUM of premiumSum from 2017-01-01 until 2017-10-02 (excluding)
countFromYearBegining (5443) - COUNT of policies from 2017-01-01 until 2017-10-02 (excluding)

sumFromYearBegining (160232.44) - SUM of premiumSum from 2017-01-01 until 2017-10-02 (excluding)
countFromYearBegining (5663) - COUNT of policies from 2017-01-01 until 2017-10-02 (excluding)

我尝试了内部连接同一表COUNTed和SUMed的操作失败,因为我无法指定需要计数和求和的范围,我尝试了LEFT联接然后进行计数,这失败了,因为直到每个结果都被计数行结果,但直到最后一个结果,等等

DB Fiddle:https://www.db-fiddle.com/f/ckM8HyTD6NjLbK41Mq1gct/5

您对SQL忍者的任何帮助都深表感谢。

2 个答案:

答案 0 :(得分:2)

在没有窗口函数可用的情况下,我们可以使用User-defined variables来计算总和/计数。

我们首先需要确定2017年每天的总数和计数(即使您只需要特定月份的行)。因为,为了计算三月月份的天数的滚动总和,我们还需要一月和二月月份的总和/计数值。一种优化的可能性是,我们可以将计算从第一个月限制到只需要一个月。

请注意,ORDER BY daydate_2017是必需的,以便能够正确计算滚动总和。默认情况下,数据是无序的。如果不定义顺序,我们不能保证Sum是正确的。

此外,我们需要两个级别的子选择查询。第一级用于计算滚动总和值。第二级仅用于将结果限制为2月。由于WHERESELECT之前执行;我们不能在第一级本身中将结果限制为2月。

如果您在2018年也需要类似的总和;可以在其他子选择查询集中实现类似的查询逻辑。

SELECT dt2_2017.*, dt_2018.*
FROM 
(
SELECT dt_2017.*,
       @totsum := @totsum + dt_2017.premiumsum_2017 AS sumFromYearBegining_2017,
       @totcount := @totcount + dt_2017.policycount_2017 AS countFromYearBeginning_2017
FROM   (SELECT Date(po1.createdat)        AS dayDate_2017,
               Sum(po1.calculatedpremium) AS premiumSum_2017,
               Count(po1.policyorderid)   AS policyCount_2017
        FROM   PolicyOrder AS po1
        WHERE  po1.policyorderstatusid = 6 AND 
               YEAR(po1.createdat) = 2017 AND 
               MONTH(po1.createdat) <= 2 -- calculate upto February for 2017
        GROUP  BY daydate_2017
        ORDER  BY daydate_2017) AS dt_2017
CROSS JOIN (SELECT @totsum := 0, @totcount := 0) AS user_init_vars 
) AS dt2_2017 
INNER JOIN (
             SELECT 
               DATE(po2.CreatedAt) AS dayDate_2018, 
               SUM(po2.CalculatedPremium) AS premiumSum_2018, 
               COUNT(po2.PolicyOrderId) AS policyCount_2018
             FROM 
               PolicyOrder po2
             WHERE
                YEAR(po2.CreatedAt) = 2018 AND 
                MONTH(po2.CreatedAt) = 2 AND
                po2.PolicyOrderStatusId = 6 
             GROUP BY
                dayDate_2018
           ) dt_2018 ON DAY(dt_2018.dayDate_2018) = DAY(dt2_2017.dayDate_2017)   
WHERE YEAR(dt2_2017.daydate_2017) = 2017 AND 
      MONTH(dt2_2017.daydate_2017) = 2;

结果:View on DB Fiddle

| dayDate_2017 | premiumSum_2017 | policyCount_2017 | sumFromYearBegining_2017 | countFromYearBeginning_2017 | dayDate_2018 | premiumSum_2018 | policyCount_2018 |
| ------------ | --------------- | ---------------- | ------------------------ | --------------------------- | ------------ | --------------- | ---------------- |
| 2017-02-01   | 4131.16         | 131              | 118346.77                | 3627                        | 2018-02-01   | 8323.91         | 149              |
| 2017-02-02   | 2712.74         | 85               | 121059.51000000001       | 3712                        | 2018-02-02   | 9469.33         | 153              |
| 2017-02-03   | 3888.59         | 111              | 124948.1                 | 3823                        | 2018-02-03   | 6409.21         | 97               |
| 2017-02-04   | 2447.99         | 74               | 127396.09000000001       | 3897                        | 2018-02-04   | 5693.69         | 120              |
| 2017-02-05   | 1437.5          | 45               | 128833.59000000001       | 3942                        | 2018-02-05   | 8574.97         | 129              |
| 2017-02-06   | 4254.48         | 127              | 133088.07                | 4069                        | 2018-02-06   | 8277.51         | 133              |
| 2017-02-07   | 4746.49         | 136              | 137834.56                | 4205                        | 2018-02-07   | 9853.75         | 173              |
| 2017-02-08   | 3898.05         | 125              | 141732.61                | 4330                        | 2018-02-08   | 9116.33         | 144              |
| 2017-02-09   | 8306.86         | 286              | 150039.46999999997       | 4616                        | 2018-02-09   | 8818.32         | 166              |
| 2017-02-10   | 6740.99         | 204              | 156780.45999999996       | 4820                        | 2018-02-10   | 7880.17         | 134              |
| 2017-02-11   | 4290.38         | 133              | 161070.83999999997       | 4953                        | 2018-02-11   | 8394.15         | 180              |
| 2017-02-12   | 3687.58         | 122              | 164758.41999999995       | 5075                        | 2018-02-12   | 10378.29        | 171              |
| 2017-02-13   | 4939.31         | 159              | 169697.72999999995       | 5234                        | 2018-02-13   | 9383.15         | 160              |

答案 1 :(得分:1)

如果您想要一种避免在选择列表中使用@variables并且避免分析的方法(仅mysql 8支持它们),则可以使用半笛卡尔积:

WITH prevYr AS(
    SELECT 
        YEAR(CreatedAt) AS year_prev,
        MONTH(CreatedAt) AS month_prev,
        DAY(CreatedAt) AS day_prev,
        SUM(CalculatedPremium) AS premiumSum_prev, 
        COUNT(PolicyOrderId) AS policyCount_prev
    FROM 
        PolicyOrder
    WHERE
        CreatedAt BETWEEN '2017-02-01' AND '2017-02-28' AND
        PolicyOrderStatusId = 6 
    GROUP BY
        YEAR(CreatedAt), MONTH(CreatedAt), DAY(CreatedAt)
),
currYr AS (
    SELECT 
        YEAR(CreatedAt) AS year_curr,
        MONTH(CreatedAt) AS month_curr,
        DAY(CreatedAt) AS day_curr,
        SUM(CalculatedPremium) AS premiumSum_curr, 
        COUNT(PolicyOrderId) AS policyCount_curr
    FROM 
        PolicyOrder
    WHERE
        CreatedAt BETWEEN '2018-02-01' AND '2018-02-28' AND
        PolicyOrderStatusId = 6 
    GROUP BY
        YEAR(CreatedAt), MONTH(CreatedAt), DAY(CreatedAt)
) 


SELECT 
      *
FROM
       prevYr
       INNER JOIN 
       currYr
       ON  
           currYr.day_curr = prevYr.day_prev

       INNER JOIN
       (
           SELECT 
                main.day_prev AS dayRolling_prev, 
                SUM(pre.premiumSum_prev) AS premiumSumRolling_prev, 
                SUM(pre.policyCount_prev) AS policyCountRolling_prev
           FROM 
                prevYr main LEFT OUTER JOIN prevYr pre ON pre.day_prev < main.day_prev
           GROUP BY
                main.day_prev
        ) rollingPrev
        ON  
           currYr.day_curr = rollingPrev.dayRolling_prev

ORDER BY 1,2,3

我们将2017年和2018年的数据汇总为两个CTE,因为这会使以后的工作变得更加整洁和整洁,尤其是对于这种滚动计数。您可能很容易遵循CTE的逻辑,因为它从您的查询中或多或少地得到了直接-我只删除了DATE列,而选择了年/月/日三元组,因为它使其他内容更清晰(合并)并且可以重新组合如有需要,请确定日期。我还交换了WHERE子句以使用date BETWEEN x AND y,因为这将利用列的索引,而使用YEAR(date) = x AND MONTH(date) = y可能不会

滚动计数通过我称为半笛卡尔的东西起作用。它实际上是笛卡尔积。导致一个表中的行相乘并在输出中重复表示的任何数据库联接都是笛卡尔积。在这种情况下,它不是完整的产品(每行与其他行交叉)使用小于,因此每行仅与行的子集交叉。随着日期的增加,更多行与谓词匹配,因为第30个日期有29行小于该行。

因此会导致以下数据模式:

maindate   predate    maincount precount
2017-02-01 NULL       10        NULL

2017-02-02 2017-02-01 20        10

2017-02-03 2017-02-01 30        10
2017-02-03 2017-02-02 30        20

2017-02-04 2017-02-01 40        10
2017-02-04 2017-02-02 40        20
2017-02-04 2017-02-03 40        30

您可以看到,对于任何给定的主日期,它重复N-1次,因为比满足连接条件predate < maindate的日期低N-1个日期

如果我们将主日期进行分组并加总与每个前置日期相关的计数,我们将得出该主日期所有前置计数的滚动总和(因此,在该月的第4天,它是SUM(前置计数日期1-3rd,即10 + 20 + 30 =60。在第5天,我们将第1至4天的计数相加。在第6天,我们将第1至5天相加,等等)