我正在使用此NOT IN查询从单个表中返回非活动用户。
SELECT *
FROM
( SELECT DISTINCT name
FROM userlog
WHERE created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND isSample = 0
) inactive
WHERE inactive.name NOT IN
(
SELECT name AS name
FROM userlog
WHERE created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' AND isSample = 0
)
此查询的描述:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | select_type | table | partitions | type | possiblekeys | Keys | key_len | ref | rows | filtered | extra |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | primary | <derived2>| (null)OK | ALL | NULL | null | NULL | NULL| 50000 | 100.00 | using where |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3 | subquery | userlog | (null)OK | range| *list of indexes | nameindex | 774 | NULL| 1000000| 10.00 | using index condition |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 | derived | userlog | (null)OK | range| *list of indexes | nameindex | 774 | NULL| 500000 | 10.00 | using index condition; using temporary |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
我不想基于名称进行查询,因为名称可能会更改,但是其ID永远不会更改,因此我改用ID进行查询。我使用相同的查询,只是更改字段
SELECT *
FROM
(SELECT DISTINCT(id) AS id
FROM userlog
WHERE created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND isSample = '0'
) inactive
WHERE inactive.id NOT IN
(SELECT id AS id
FROM userlog
WHERE created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59'
AND isSample = '0')
现在此查询的描述与上面的不同:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | select_type | table | partitions | type | possiblekeys | Keys | key_len | ref | rows | filtered | extra |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | primary | <derived2>| (null)OK | ALL | NULL | null | NULL | NULL| 50000 | 100.00 | using where |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 3 |dependent subquery| userlog | (null)OK |index_subquery| *list of indexes | countindex | 768 | func| 892 | 0.61 | using where; full scan on null key |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 | derived | userlog | (null)OK | range | *list of indexes | idindex | 774 | NULL| 500000 | 10.00 | using index condition; using temporary |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
查询现在使用一个从属子查询,并且正在执行全表扫描,这在我的表上非常慢(20+百万条记录)。我注意到ID查询未使用idindex,但正在使用我的计数索引。如果我单独将每个查询分开,它们都将使用ID索引,但是当将它们组合为NOT IN时,将使用计数索引。
这是我的索引:
+--------------------------------------------------------------------------------------------------------------------------------+
| TABLE | NON_UNIQUE | KEY NAME | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULL | INDEX_TYPE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |countindex| 1 | id | A | 75000 | 255 | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |countindex| 2 | pk | A | 11500000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |nameindex | 1 | created | A | 6800000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |nameindex | 2 | sample | A | 13500000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 |nameindex | 3 | name | A | 24000000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 | idindex | 1 | id | A | 75000 | 512 | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 | idindex | 2 | created | A | 22000000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
| userlog | 1 | idindex | 3 | sample | A | 20500000 | null | NULL | YES | BTREE |
+--------------------------------------------------------------------------------------------------------------------------------+
有人知道为什么要使用其他索引吗?
此外,有没有一种方法可以优化ID查询,从而这不是问题?
如果我缺少任何信息,我可以更新问题。
编辑:
这是下面答案的更新说明:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | select_type | table | partitions | type | possiblekeys | Keys | key_len | ref | rows | filtered | extra |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | primary | t1 | (null)OK | range | *list of indexes | nameindex | 774 | NULL | 500000 | 10.00 | using index condition; using where; using temporary|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2 |dependent subquery| t2 | (null)OK | ref | *list of indexes | idonlyindex | 768 | db.t1.id| 892 | 0.61 | using where; |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
注意:idonlyindex是仅在id字段上的索引
答案 0 :(得分:0)
除了使用子查询,您还可以使用GROUP BY
和基于条件HAVING
的基于子句的过滤来解决此问题:
SELECT id
FROM userlog
WHERE isSample = '0'
GROUP BY id
HAVING
/* No activity in last 30 days */
NOT SUM(created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59')
AND
/* Activity in 7 days prior to last 30 days */
SUM(created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY)
另一种方法可以利用Correlated Subqueries:
SELECT
DISTINCT t1.id
FROM userlog AS t1
WHERE t1.isSample = '0'
AND t1.created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND t1.created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND NOT EXISTS (SELECT 1
FROM userlog AS t2
WHERE t2.id = t1.id
AND t2.isSample = '0'
AND t2.created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND t2.created <= '2019-07-13 23:59:59')
尝试两个查询,并检查哪个查询更有效。您可能还需要在(isSample, id, created)
答案 1 :(得分:0)
可能是这样吗?
SELECT DISTINCT id
FROM userlog
WHERE
( created >= '2019-07-07 00:00:00' - INTERVAL 30 DAY
AND created <= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND isSample = 0
)
AND name NOT IN
(
SELECT u1.name
FROM userlog as u1
WHERE u1created >= '2019-07-13 23:59:59' - INTERVAL 30 DAY
AND u1created <= '2019-07-13 23:59:59' AND u1.isSample = 0
)
如果您使用name
列进行过滤,则添加索引会很好。
添加括号是为了覆盖逻辑以独立于第二逻辑进行处理。