检查每个组是否存在不同的值

时间:2015-10-20 02:04:20

标签: mysql sql group-by query-optimization distinct

编辑:

假设我在MySQL中有以下表格:

CREATE TABLE `events` (
`pv_name` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL,
`time_stamp` bigint(20) UNSIGNED NOT NULL,
`value` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin,
PRIMARY KEY (`pv_name`, `time_stamp`)
) ENGINE=InnoDB;

我可以使用以下查询找到此表中包含多个不同pv_name的每个value

SELECT events.pv_name
FROM events
GROUP BY events.pv_name
HAVING COUNT(DISTINCT events.value) > 1;

问题是此查询效率不高。它会计算所有不同的值,而不是在找到多个值后停止。

一个建议如下:

SELECT events.pv_name
FROM events
GROUP BY events.pv_name
HAVING MIN(events.value) < MAX(events.value);

如果索引包含value,则此方法很有效。但是,value是一个文本列,因此它不能。

是否有其他方法可以提高搜索效率?某种形式的相关子查询可能呢?我想继续使用MySQL,但如果在另一个数据库服务器中有一个功能可以帮助我,我可以考虑转移到它。

3 个答案:

答案 0 :(得分:0)

要回答您的问题,最好避免group bydistinct。首先,我建议为表添加一个自动递增的event_id。这使得可以确定两行是否相同。

所以,我建议以下查询:

select e.*
from events e
where e.time_stamp between $ts1 and $ts2 and
      exists (select 1
              from events e2
              where e2.pv_name = e.pv_name and
                    e2.time_stamp between $ts1 and $ts2 and
                    e2.event_id < e.event_id
             );

您还需要索引:events(time_stamp, pv_name, event_id)events(pv_name, time_stamp, event_id)

这会找到一对事件。您可以使用select distinct pv_name。但是,这会导致一系列额外的处理以删除重复项。

答案 1 :(得分:0)

SELECT * FROM Customers WHERE pv_name IN
(SELECT pv_name FROM Customers GROUP BY pv_name HAVING COUNT(*) > 1) AND
 time_stamp BETWEEN 'start_time' and `end_time'

OR

SELECT * FROM Customers GROUP BY pv_name HAVING MIN(time_stamp ) < MAX(time_stamp );

这可能有效。

答案 2 :(得分:0)

我相信以下可能有用吗?可以改进吗?

-- Chooses a single non null `value` from the `events` table for each `pv_name`.
CREATE TEMPORARY TABLE single_values ( PRIMARY KEY (pv_name) ) ENGINE=Memory AS (
SELECT events.pv_name, events.value
FROM events
WHERE events.value IS NOT NULL
GROUP BY events.pv_name );

-- Finds each `pv_name` that has a `value` different than the one for it in `single_values`.
-- This is a correlated subquery.
SELECT single_values.pv_name
FROM single_values
WHERE 1 = (
SELECT 1
FROM events
WHERE events.pv_name = single_values.pv_name
AND events.value <> single_values.value
AND events.value IS NOT NULL
LIMIT 1 );