我有这个查询
SELECT
DC.variationID, COUNT(DISTINCT(DC.userID)) AS conversion
FROM
XXXX AS DC
WHERE
DC.testID = 'XXXX' AND DC.visit > 1
GROUP BY
DC.variationID
这是表格的描述
CREATE TABLE `XXXX`
(
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userID` bigint(17) NOT NULL,
`testID` bigint(20) NOT NULL,
`variationID` bigint(20) NOT NULL,
`url` bigint(20) NOT NULL,
`time` bigint(20) NOT NULL,
`visit` bigint(20) NOT NULL DEFAULT '1',
`isTestPage` tinyint(1) NOT NULL,
PRIMARY KEY (`id`,`testID`),
KEY `url` (`url`),
KEY `dc3_testIDPage` (`testID`,`url`),
KEY `testid_istest` (`testID`,`isTestPage`),
KEY `dc3_varIDPage` (`variationID`,`url`),
KEY `index_rebond` (`testID`,`visit`,`variationID`),
KEY `dc3_testIDvarIDPage` (`testID`,`variationID`,`url`),
KEY `isTestPage2` (`variationID`,`isTestPage`,`visit`,`userID`),
KEY `user_test_varID_url` (`userID`,`testID`,`variationID`,`url`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
这是查询的解释
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 'DC', NULL, 'ref', 'dc3_testIDPage,testid_istest,dc3_varIDPage,index_rebond,dc3_testIDvarIDPage,isTestPage2,user_test_varID_url', 'dc3_testIDvarIDPage', '8', 'const', '13695309', '33.33', 'Using index condition; Using where'
对我来说,查询应该使用索引'index_rebond',但不幸的是它不使用它。我很确定在查询之前使用索引'index_rebond'。
查询需要很长时间才能完成。你能否告诉我为什么查询不使用索引'index_rebond'以及优化查询的最佳方法是什么?
这是show index
的结果# Table, Non_unique, Key_name, Seq_in_index, Column_name, Collation, Cardinality, Sub_part, Packed, Null, Index_type, Comment, Index_comment
'datacollect_v3', '0', 'PRIMARY', '1', 'id', 'A', '25909280', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '0', 'PRIMARY', '2', 'testID', 'A', '25909280', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'url', '1', 'url', 'A', '1657369', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDPage', '1', 'testID', 'A', '2167', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDPage', '2', 'url', 'A', '1850256', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'testid_istest', '1', 'testID', 'A', '3813', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'testid_istest', '2', 'isTestPage', 'A', '5721', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_varIDPage', '1', 'variationID', 'A', '2053', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_varIDPage', '2', 'url', 'A', '4171834', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'index_rebond', '1', 'testID', 'A', '1811', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'index_rebond', '2', 'visit', 'A', '11357', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'index_rebond', '3', 'variationID', 'A', '17208', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDvarIDPage', '1', 'testID', 'A', '2049', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDvarIDPage', '2', 'variationID', 'A', '3513', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'dc3_testIDvarIDPage', '3', 'url', 'A', '929052', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '1', 'variationID', 'A', '1891', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '2', 'isTestPage', 'A', '3309', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '3', 'visit', 'A', '16172', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'isTestPage2', '4', 'userID', 'A', '2712038', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '1', 'userID', 'A', '1103566', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '2', 'testID', 'A', '1336479', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '3', 'variationID', 'A', '1325388', NULL, NULL, '', 'BTREE', '', ''
'datacollect_v3', '1', 'user_test_varID_url', '4', 'url', 'A', '16936138', NULL, NULL, '', 'BTREE', '', ''
致以最诚挚的问候,
答案 0 :(得分:1)
PRIMARY KEY (id, testID)
进行分区,否则 testID
没有意义。你不应该PRIMARY KEY (id)
吗?
如果您有一些天然独特的列组合,请考虑删除id
并将该组合用作PK。
这将覆盖"因此可能显着更快:
INDEX(testID, visit, variationID, userID)
(并且摆脱index_rebond
,因为它将是多余的。)isTestPage2
也涵盖(因为包含了PK),但它的排序效率低下。
除非您真的希望拥有超过40亿的各种ID,否则建议您将它们切换到INT UNSIGNED
(此处和其他表格中)。将此表及其索引缩减一半将有助于提高性能,尤其是在数据集太大而无法缓存的情况下。
答案 1 :(得分:0)
查询优化器可能更喜欢dc3_testIDPage或testid_istest,因为它们也以testID开头
但您可以强制使用索引
SELECT DC.variationID, COUNT(DISTINCT(DC.userID)) AS conversion
FROM XXXX AS DC force index_rebond
WHERE DC.testID = 'XXXX' AND DC.visit > 1
GROUP BY DC.variationID