使用group by子句查询的顺序不正确

时间:2016-04-22 14:24:13

标签: mysql sql

所以我有以下问题:

SELECT sensor.id as `sensor_id`,
       sensor_reading.id as `reading_id`,
       sensor_reading.reading as `reading`,
       from_unixtime(sensor_reading.reading_timestamp) as `reading_timestamp`,
       sensor_reading.lower_threshold as `lower_threshold`,
       sensor_reading.upper_threshold as `upper_threshold`,
       sensor_type.units as `unit`
FROM sensor
LEFT JOIN sensor_reading ON sensor_reading.sensor_id = sensor.id
LEFT JOIN sensor_type ON sensor.sensor_type_id = sensor_type.id
WHERE sensor.company_id = 1
GROUP BY sensor_reading.sensor_id
ORDER BY sensor_reading.reading_timestamp DESC

这里有三张桌子。 sensor_type 表,仅用于单个显示字段(单位),传感器表,其中包含有关传感器的信息,以及 sensor_reading table,包含传感器的各个读数。有多个读数适用于单个传感器,因此sensor_reading表中的每个条目都有一个 sensor_id ,它与传感器表中的 ID 字段链接到外部关键约束。

理论上,此查询应返回EACH唯一传感器的最新sensor_reading。相反,它会返回每个传感器的第一个读数。我在这里发现了一些类似问题的帖子,但是还没有能够使用他们的任何答案来解决这个问题。理想情况下,查询需要尽可能高效,因为此表有几千个读数(并且继续增长)。

有谁知道如何更改此查询以返回最近的阅读?如果我删除 GROUP BY 子句,它会返回正确的顺序,但我必须筛选数据以获得每个传感器的最新数据。

理想情况下,我不想运行子查询,因为这会减慢很多事情,速度是这里的一个重要因素。

谢谢!

1 个答案:

答案 0 :(得分:3)

  

理论上,此查询应返回EACH唯一传感器的最新sensor_reading。

这是MySQL Group by extension的一个相当常见的误解,它允许您选择没有聚合的列,这些列不包含在group by子句中。文档说明的是:

  

服务器可以自由选择每个组中的任何值,因此除非它们相同,否则所选的值是不确定的。此外,添加ORDER BY子句

不会影响每个组中值的选择

因此,由于您按sensor_reading.sensor_id进行分组,因此对于每个sensor_reading,MySQL将从sensor_id选择任意行,然后选择为每个sensor_id选择一行,然后将排序应用于所选的行。

由于您只需要每个传感器的最新行,因此通常的方法是:

SELECT  *
FROM    sensor_reading AS sr
WHERE   NOT EXISTS
        (   SELECT  1
            FROM    sensor_reading AS sr2
            WHERE   sr2.sensor_id = sr.sensor_id
            AND     sr2.reading_timestamp > sr.reading_timestamp
        );

但是,MySQL will optimise LEFT JOIN/IS NULL better than NOT EXISTS所以MySQL特定的解决方案是:

SELECT  sr.*
FROM    sensor_reading AS sr
        LEFT JOIN sensor_reading AS sr2
            ON sr2.sensor_id = sr.sensor_id
            AND sr2.reading_timestamp > sr.reading_timestamp
WHERE   sr2.id IS NULL;

因此,将此结合到您的查询中,您最终会得到:

SELECT sensor.id as `sensor_id`,
       sensor_reading.id as `reading_id`,
       sensor_reading.reading as `reading`,
       from_unixtime(sensor_reading.reading_timestamp) as `reading_timestamp`,
       sensor_reading.lower_threshold as `lower_threshold`,
       sensor_reading.upper_threshold as `upper_threshold`,
       sensor_type.units as `unit`
FROM    sensor
        LEFT JOIN sensor_reading 
            ON sensor_reading.sensor_id = sensor.id
        LEFT JOIN sensor_type 
            ON sensor.sensor_type_id = sensor_type.id
        LEFT JOIN sensor_reading AS sr2
            ON sr2.sensor_id = sensor_reading.sensor_id
            AND sr2.reading_timestamp > sensor_reading.reading_timestamp
WHERE sensor.company_id = 1
AND sr2.id IS NULL
ORDER BY sensor_reading.reading_timestamp DESC;

获取每组最大值的另一种方法是将内连接返回到最新一行,如下所示:

SELECT  sr.*
FROM    sensor_reading AS sr
        INNER JOIN
        (   SELECT  sensor_id, MAX(reading_timestamp) AS reading_timestamp
            FROM    sensor_reading
            GROUP BY sensor_id
        ) AS sr2
            ON sr2.sensor_id = sr.sensor_id
            AND sr2.reading_timestamp = sr.reading_timestamp;

您可能会发现这比其他方法更有效,或者您可能不会,YMMV。它基本上取决于您的数据和索引,正如您所说,子查询可能是MySQL中的一个问题,因为最初的结果是完整的。