SQL

时间:2017-11-21 18:23:54

标签: sql group-by sum grouping mariadb

在做一个小的家庭项目时,我发现两个查询输出的值有所不同,部分应该产生完全相同的答案。

这是一个计算用过的电能的项目。我将下面的查询翻译成英文(原版荷兰语)。

select month(measured.Date) as Month, sum(measured.used_kwh), sum(measured.used_E) from
(select DATE_FORMAT(highRate.time,'%Y-%m-%d') as Date,
max(highRate.Value)-min(highRate.Value) + max(LowRate.Value)-min(LowRate.Value) as used_kwh,
(max(highRate.Value)-min(highRate.Value))*0.2096 + (max(LowRate.Value)-min(LowRate.Value))*0.1943 as used_E
from Item8 as highRate
left join Item7 as LowRate
on highRate.Time = LowRate.Time
group by Date) as measured
group by Month;

收率: Screenshot of result

select MONTH(highRate.time) as Month, 
max(highRate.Value)-min(highRate.Value) + max(LowRate.Value)-min(LowRate.Value) as used_kwh,
(max(highRate.Value)-min(highRate.Value))*0.2096 + (max(LowRate.Value)-min(LowRate.Value))*0.1943 as used_E
from Item8 as highRate
left join Item7 as LowRate
on highRate.Time = LowRate.Time
group by Month;

收率: Screenshot of result

我下午大部分时间都在努力弄清楚这里发生了什么,但到目前为止还没有结果。在第二列(sum(gemeten.verbruik_kwh)/ verbruik_kwh)总是差异大约为0,15。

上面的查询并不像最终版本那样需要和组织,因为我从他们所属的较大查询中复制这些特定部分,并将其更改为独立使用。

在下面的屏幕截图中,再次看到差异,但之后我希望它呈现给用户。 “Gemeten verbruik(€)”和“Gemeten verbruik(kWh)”都应该在表格中具有相同的值。 Screenshot of result of larger queries

3 个答案:

答案 0 :(得分:0)

查询不完全相同。第一个运行两个分层聚合,您可以通过适当的缩进看到它们:首先是 Date 级别,然后是 Month 级别的第二个聚合。第二个查询仅在 Month 级别运行一个聚合。

可能在您的数据中,按日期分组的最大/最小与按分组的最大/最小略有不同月。可能在同一中有多个 Date 记录。

第一次查询

SELECT month(measured.Date) AS MONTH,
       sum(measured.used_kwh),
       sum(measured.used_E)
FROM
  (SELECT DATE_FORMAT(highRate.time,'%Y-%m-%d') AS Date,

          max(highRate.Value)-min(highRate.Value) +
          max(LowRate.Value)-min(LowRate.Value) AS used_kwh,

          (max(highRate.Value)-min(highRate.Value))*0.2096 + 
          (max(LowRate.Value)-min(LowRate.Value))*0.1943 AS used_E

   FROM Item8 AS highRate
   LEFT JOIN Item7 AS LowRate ON highRate.Time = LowRate.Time
   GROUP BY Date) AS measured

GROUP BY MONTH;

第二次查询

SELECT MONTH(highRate.time) AS MONTH,

       max(highRate.Value)-min(highRate.Value) + 
       max(LowRate.Value)-min(LowRate.Value) AS used_kwh,

       (max(highRate.Value)-min(highRate.Value))*0.2096 + 
       (max(LowRate.Value)-min(LowRate.Value))*0.1943 AS used_E

FROM Item8 AS highRate
LEFT JOIN Item7 AS LowRate ON highRate.Time = LowRate.Time

GROUP BY MONTH;

真正等效的嵌套查询是在 Month 级别聚合,其中outer是多余的,甚至聚合函数也可以替换为Avg()Min(),{{1} }:

Max()

答案 1 :(得分:0)

根据您的数据,有几种可能的答案。

使用大量数据时,使用浮点数据类型时可能会发生这种情况。这是一个很长的主题,但浮点数据类型不能无限地表示十进制数,并且很容易产生明显的舍入错误(通常称为瘀伤)

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

第一个例子中内部查询的结果是什么?这可以向您显示任何舍入错误的来源吗?


但更有可能 您的数据存在不协调之处。与前一记录相比,查找value变为DOWN的行。

看起来你正在使用MySQL,这会让它变得混乱,但仍然有可能检查这些行......

SELECT
    SUM(CASE WHEN this.value < next.value THEN next.value - this.value END)  AS increases,
    SUM(CASE WHEN this.value > next.value THEN this.value - next.value END)  AS decreases
FROM
    Item8    AS this
INNER JOIN
    Item8    AS next
       ON next.time = (SELECT MIN(Item8.time) FROM Item8 WHERE Item8.time < this.time)

或者,试试这个,只是把它瞄准......

SELECT
    DATE_FORMAT(highRate.time,'%Y-%m-%d')   AS Date,
    MIN(highRate.Value)                     AS HighRateMinValue,
    MAX(highRate.Value)                     AS HighRateMaxValue,
    MIN(LowRate.Value)                      AS LowRateMinValue,
    MAX(LowRate.Value)                      AS LowRateMaxValue
FROM
    Item8   AS highRate
LEFT JOIN
    Item7   AS LowRate
        ON  highRate.Time = LowRate.Time
GROUP BY
    Date
ORDER BY
    Date

如果您看到 较低 的LowRate 最低值比前一天的LowRate 最大值,这是你的“问题”。


当你有这样的数据时,这很重要......

   MAX( {1, 2, 3, 2, 3, 4} ) - MIN( {1, 2, 3, 2, 3, 4} )
=>
   4 - 1
=> 
   3

与...相比

   [ MAX( {1, 2, 3} ) - MIN( {1, 2, 3} ) ]   +    [ MAX( {2, 3, 4} ) - MIN( {2, 3, 4} ) ]
=>
   [3 - 1] + [4 - 2]
=> 
   4


在一个不相关的说明中,出于性能原因,您可能更好地梳理每个表的结果聚合后,而不是JOIN聚合之前......

SELECT
    COALESCE(highRate.month, low_rate.month)                                            AS month,
    COALESCE(highRate.used_kwh, 0)          + COALESCE(lowRate.used_kwh, 0)             AS used_kwh,
    COALESCE(highRate.used_kwh, 0) * 0.2096 + COALESCE(lowRate.used_kwh, 0) * 0.1943    AS used_E
FROM
(
    SELECT
        DATE_FORMAT(Item8.time,'%Y-%m-01')    AS month,
        MAX(Item8.value) - MIN(Item8.value)   AS used_kwh
    FROM
        Item8
    GROUP BY
        day
)
    AS highRate
FULL OUTER JOIN
(
    SELECT
        DATE_FORMAT(Item7.time,'%Y-%m-01')    AS month,
        MAX(Item7.value) - MIN(Item7.value)   AS used_kwh
    FROM
        Item7
    GROUP BY
        day
)
    AS lowRate
        ON lowRate.month = highRate.month

这将允许查询规划器更快地识别每个表(或表的行范围)的MIN和MAX值,并显着减少需要的行数加入。

如果LowRate中的行不在HighRate中,以及同时存在多个条目的情况,这也可以保护您。

<强> 编辑:

聚合到第一天,然后是月份的聚合然后加入版本。

SELECT
    MONTH(COALESCE(highRate.day, low_rate.day))                                                   AS month,
    COALESCE(SUM(highRate.used_kwh), 0)          + COALESCE(SUM(lowRate.used_kwh), 0)             AS used_kwh,
    COALESCE(SUM(highRate.used_kwh), 0) * 0.2096 + COALESCE(SUM(lowRate.used_kwh), 0) * 0.1943    AS used_E
FROM
(
    SELECT
        DATE_FORMAT(Item8.time,'%Y-%m-%d')    AS day,
        MAX(Item8.value) - MIN(Item8.value)   AS used_kwh
    FROM
        Item8
    GROUP BY
        day
)
    AS highRate
FULL OUTER JOIN
(
    SELECT
        DATE_FORMAT(Item7.time,'%Y-%m-%d')    AS day,
        MAX(Item7.value) - MIN(Item7.value)   AS used_kwh
    FROM
        Item7
    GROUP BY
        day
)
    AS lowRate
        ON lowRate.day = highRate.day 
GROUP BY
   month

<强> 编辑:

更短的方法(完全避免JOINCOALESCE)......

SELECT
    month,

    SUM(high)                                 AS used_kwh_high,
                         SUM(low)             AS used_kwh_low,
    SUM(high)          + SUM(low)             AS used_kwh,

    SUM(high) * 0.2096                        AS used_E_high,
                         SUM(low) * 0.1943    AS used_E_low,
    SUM(high) * 0.2096 + SUM(low) * 0.1943    AS used_E
FROM
(
    SELECT DATE_FORMAT(time,'%Y-%m-01') AS month, MAX(value) - MIN(value) AS high, 0 AS low FROM Item8 GROUP BY month
    UNION ALL
    SELECT DATE_FORMAT(time,'%Y-%m-01') AS month, 0 AS high, MAX(value) - MIN(value) AS low FROM Item7 GROUP BY month
)
    combined_rates
GROUP BY
    month

Day the Month聚合版本......

SELECT
    DATE_FORMAT(day,'%Y-%m-01') AS month,

    SUM(high)                                 AS used_kwh_high,
                         SUM(low)             AS used_kwh_low,
    SUM(high)          + SUM(low)             AS used_kwh,

    SUM(high) * 0.2096                        AS used_E_high,
                         SUM(low) * 0.1943    AS used_E_low,
    SUM(high) * 0.2096 + SUM(low) * 0.1943    AS used_E
FROM
(
    SELECT DATE_FORMAT(time,'%Y-%m-%d') AS day, MAX(value) - MIN(value) AS high, 0 AS low FROM Item8 GROUP BY day
    UNION ALL
    SELECT DATE_FORMAT(time,'%Y-%m-%d') AS day, 0 AS high, MAX(value) - MIN(value) AS low FROM Item7 GROUP BY day
)
    combined_rates
GROUP BY
    month

答案 2 :(得分:0)

@MatBailie: 第一种方法坚持使用案例:

precompile

第二个“眼球”方法导致:

increases: NULL; decreases: 18323.261840820312

我觉得这一切看起来都不错,还是我错过了这一点?