编辑:
假设我在MySQL中有以下表格:
CREATE TABLE `events` (
`pv_name` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL,
`time_stamp` bigint(20) UNSIGNED NOT NULL,
`value` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin,
PRIMARY KEY (`pv_name`, `time_stamp`)
) ENGINE=InnoDB;
我可以使用以下查询找到此表中包含多个不同pv_name
的每个value
:
SELECT events.pv_name
FROM events
GROUP BY events.pv_name
HAVING COUNT(DISTINCT events.value) > 1;
问题是此查询效率不高。它会计算所有不同的值,而不是在找到多个值后停止。
一个建议如下:
SELECT events.pv_name
FROM events
GROUP BY events.pv_name
HAVING MIN(events.value) < MAX(events.value);
如果索引包含value
,则此方法很有效。但是,value
是一个文本列,因此它不能。
是否有其他方法可以提高搜索效率?某种形式的相关子查询可能呢?我想继续使用MySQL,但如果在另一个数据库服务器中有一个功能可以帮助我,我可以考虑转移到它。
答案 0 :(得分:0)
要回答您的问题,最好避免group by
或distinct
。首先,我建议为表添加一个自动递增的event_id
。这使得可以确定两行是否相同。
所以,我建议以下查询:
select e.*
from events e
where e.time_stamp between $ts1 and $ts2 and
exists (select 1
from events e2
where e2.pv_name = e.pv_name and
e2.time_stamp between $ts1 and $ts2 and
e2.event_id < e.event_id
);
您还需要索引:events(time_stamp, pv_name, event_id)
和events(pv_name, time_stamp, event_id)
。
这会找到一对事件。您可以使用select distinct pv_name
。但是,这会导致一系列额外的处理以删除重复项。
答案 1 :(得分:0)
SELECT * FROM Customers WHERE pv_name IN
(SELECT pv_name FROM Customers GROUP BY pv_name HAVING COUNT(*) > 1) AND
time_stamp BETWEEN 'start_time' and `end_time'
OR
SELECT *
FROM Customers
GROUP BY pv_name
HAVING MIN(time_stamp ) < MAX(time_stamp )
;
这可能有效。
答案 2 :(得分:0)
我相信以下可能有用吗?可以改进吗?
-- Chooses a single non null `value` from the `events` table for each `pv_name`.
CREATE TEMPORARY TABLE single_values ( PRIMARY KEY (pv_name) ) ENGINE=Memory AS (
SELECT events.pv_name, events.value
FROM events
WHERE events.value IS NOT NULL
GROUP BY events.pv_name );
-- Finds each `pv_name` that has a `value` different than the one for it in `single_values`.
-- This is a correlated subquery.
SELECT single_values.pv_name
FROM single_values
WHERE 1 = (
SELECT 1
FROM events
WHERE events.pv_name = single_values.pv_name
AND events.value <> single_values.value
AND events.value IS NOT NULL
LIMIT 1 );