在做一个小的家庭项目时,我发现两个查询输出的值有所不同,部分应该产生完全相同的答案。
这是一个计算用过的电能的项目。我将下面的查询翻译成英文(原版荷兰语)。
select month(measured.Date) as Month, sum(measured.used_kwh), sum(measured.used_E) from
(select DATE_FORMAT(highRate.time,'%Y-%m-%d') as Date,
max(highRate.Value)-min(highRate.Value) + max(LowRate.Value)-min(LowRate.Value) as used_kwh,
(max(highRate.Value)-min(highRate.Value))*0.2096 + (max(LowRate.Value)-min(LowRate.Value))*0.1943 as used_E
from Item8 as highRate
left join Item7 as LowRate
on highRate.Time = LowRate.Time
group by Date) as measured
group by Month;
select MONTH(highRate.time) as Month,
max(highRate.Value)-min(highRate.Value) + max(LowRate.Value)-min(LowRate.Value) as used_kwh,
(max(highRate.Value)-min(highRate.Value))*0.2096 + (max(LowRate.Value)-min(LowRate.Value))*0.1943 as used_E
from Item8 as highRate
left join Item7 as LowRate
on highRate.Time = LowRate.Time
group by Month;
我下午大部分时间都在努力弄清楚这里发生了什么,但到目前为止还没有结果。在第二列(sum(gemeten.verbruik_kwh)/ verbruik_kwh)总是差异大约为0,15。
上面的查询并不像最终版本那样需要和组织,因为我从他们所属的较大查询中复制这些特定部分,并将其更改为独立使用。
在下面的屏幕截图中,再次看到差异,但之后我希望它呈现给用户。 “Gemeten verbruik(€)”和“Gemeten verbruik(kWh)”都应该在表格中具有相同的值。
答案 0 :(得分:0)
查询不完全相同。第一个运行两个分层聚合,您可以通过适当的缩进看到它们:首先是 Date 级别,然后是 Month 级别的第二个聚合。第二个查询仅在 Month 级别运行一个聚合。
可能在您的数据中,按日期分组的最大/最小值与按分组的最大/最小值略有不同月。可能在同一月中有多个 Date 记录。
第一次查询
SELECT month(measured.Date) AS MONTH,
sum(measured.used_kwh),
sum(measured.used_E)
FROM
(SELECT DATE_FORMAT(highRate.time,'%Y-%m-%d') AS Date,
max(highRate.Value)-min(highRate.Value) +
max(LowRate.Value)-min(LowRate.Value) AS used_kwh,
(max(highRate.Value)-min(highRate.Value))*0.2096 +
(max(LowRate.Value)-min(LowRate.Value))*0.1943 AS used_E
FROM Item8 AS highRate
LEFT JOIN Item7 AS LowRate ON highRate.Time = LowRate.Time
GROUP BY Date) AS measured
GROUP BY MONTH;
第二次查询
SELECT MONTH(highRate.time) AS MONTH,
max(highRate.Value)-min(highRate.Value) +
max(LowRate.Value)-min(LowRate.Value) AS used_kwh,
(max(highRate.Value)-min(highRate.Value))*0.2096 +
(max(LowRate.Value)-min(LowRate.Value))*0.1943 AS used_E
FROM Item8 AS highRate
LEFT JOIN Item7 AS LowRate ON highRate.Time = LowRate.Time
GROUP BY MONTH;
真正等效的嵌套查询是在 Month 级别聚合,其中outer是多余的,甚至聚合函数也可以替换为Avg()
,Min()
,{{1} }:
Max()
答案 1 :(得分:0)
根据您的数据,有几种可能的答案。
使用大量数据时,使用浮点数据类型时可能会发生这种情况。这是一个很长的主题,但浮点数据类型不能无限地表示十进制数,并且很容易产生明显的舍入错误(通常称为瘀伤)。
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
第一个例子中内部查询的结果是什么?这可以向您显示任何舍入错误的来源吗?
但更有可能 您的数据存在不协调之处。与前一记录相比,查找value
变为DOWN的行。
看起来你正在使用MySQL
,这会让它变得混乱,但仍然有可能检查这些行......
SELECT
SUM(CASE WHEN this.value < next.value THEN next.value - this.value END) AS increases,
SUM(CASE WHEN this.value > next.value THEN this.value - next.value END) AS decreases
FROM
Item8 AS this
INNER JOIN
Item8 AS next
ON next.time = (SELECT MIN(Item8.time) FROM Item8 WHERE Item8.time < this.time)
或者,试试这个,只是把它瞄准......
SELECT
DATE_FORMAT(highRate.time,'%Y-%m-%d') AS Date,
MIN(highRate.Value) AS HighRateMinValue,
MAX(highRate.Value) AS HighRateMaxValue,
MIN(LowRate.Value) AS LowRateMinValue,
MAX(LowRate.Value) AS LowRateMaxValue
FROM
Item8 AS highRate
LEFT JOIN
Item7 AS LowRate
ON highRate.Time = LowRate.Time
GROUP BY
Date
ORDER BY
Date
如果您看到 较低 的LowRate 最低值比前一天的LowRate 最大值,这是你的“问题”。
当你有这样的数据时,这很重要......
MAX( {1, 2, 3, 2, 3, 4} ) - MIN( {1, 2, 3, 2, 3, 4} )
=>
4 - 1
=>
3
与...相比
[ MAX( {1, 2, 3} ) - MIN( {1, 2, 3} ) ] + [ MAX( {2, 3, 4} ) - MIN( {2, 3, 4} ) ]
=>
[3 - 1] + [4 - 2]
=>
4
在一个不相关的说明中,出于性能原因,您可能更好地梳理每个表的结果聚合后,而不是JOIN聚合之前......
SELECT
COALESCE(highRate.month, low_rate.month) AS month,
COALESCE(highRate.used_kwh, 0) + COALESCE(lowRate.used_kwh, 0) AS used_kwh,
COALESCE(highRate.used_kwh, 0) * 0.2096 + COALESCE(lowRate.used_kwh, 0) * 0.1943 AS used_E
FROM
(
SELECT
DATE_FORMAT(Item8.time,'%Y-%m-01') AS month,
MAX(Item8.value) - MIN(Item8.value) AS used_kwh
FROM
Item8
GROUP BY
day
)
AS highRate
FULL OUTER JOIN
(
SELECT
DATE_FORMAT(Item7.time,'%Y-%m-01') AS month,
MAX(Item7.value) - MIN(Item7.value) AS used_kwh
FROM
Item7
GROUP BY
day
)
AS lowRate
ON lowRate.month = highRate.month
这将允许查询规划器更快地识别每个表(或表的行范围)的MIN和MAX值,并显着减少需要的行数加入。
如果LowRate中的行不在HighRate中,以及同时存在多个条目的情况,这也可以保护您。
<强> 编辑: 强>
聚合到第一天,然后是月份的聚合然后加入版本。
SELECT
MONTH(COALESCE(highRate.day, low_rate.day)) AS month,
COALESCE(SUM(highRate.used_kwh), 0) + COALESCE(SUM(lowRate.used_kwh), 0) AS used_kwh,
COALESCE(SUM(highRate.used_kwh), 0) * 0.2096 + COALESCE(SUM(lowRate.used_kwh), 0) * 0.1943 AS used_E
FROM
(
SELECT
DATE_FORMAT(Item8.time,'%Y-%m-%d') AS day,
MAX(Item8.value) - MIN(Item8.value) AS used_kwh
FROM
Item8
GROUP BY
day
)
AS highRate
FULL OUTER JOIN
(
SELECT
DATE_FORMAT(Item7.time,'%Y-%m-%d') AS day,
MAX(Item7.value) - MIN(Item7.value) AS used_kwh
FROM
Item7
GROUP BY
day
)
AS lowRate
ON lowRate.day = highRate.day
GROUP BY
month
<强> 编辑: 强>
更短的方法(完全避免JOIN
和COALESCE
)......
SELECT
month,
SUM(high) AS used_kwh_high,
SUM(low) AS used_kwh_low,
SUM(high) + SUM(low) AS used_kwh,
SUM(high) * 0.2096 AS used_E_high,
SUM(low) * 0.1943 AS used_E_low,
SUM(high) * 0.2096 + SUM(low) * 0.1943 AS used_E
FROM
(
SELECT DATE_FORMAT(time,'%Y-%m-01') AS month, MAX(value) - MIN(value) AS high, 0 AS low FROM Item8 GROUP BY month
UNION ALL
SELECT DATE_FORMAT(time,'%Y-%m-01') AS month, 0 AS high, MAX(value) - MIN(value) AS low FROM Item7 GROUP BY month
)
combined_rates
GROUP BY
month
Day the Month聚合版本......
SELECT
DATE_FORMAT(day,'%Y-%m-01') AS month,
SUM(high) AS used_kwh_high,
SUM(low) AS used_kwh_low,
SUM(high) + SUM(low) AS used_kwh,
SUM(high) * 0.2096 AS used_E_high,
SUM(low) * 0.1943 AS used_E_low,
SUM(high) * 0.2096 + SUM(low) * 0.1943 AS used_E
FROM
(
SELECT DATE_FORMAT(time,'%Y-%m-%d') AS day, MAX(value) - MIN(value) AS high, 0 AS low FROM Item8 GROUP BY day
UNION ALL
SELECT DATE_FORMAT(time,'%Y-%m-%d') AS day, 0 AS high, MAX(value) - MIN(value) AS low FROM Item7 GROUP BY day
)
combined_rates
GROUP BY
month
答案 2 :(得分:0)
@MatBailie: 第一种方法坚持使用案例:
precompile
第二个“眼球”方法导致:
increases: NULL; decreases: 18323.261840820312
我觉得这一切看起来都不错,还是我错过了这一点?