我在使用mysql 5.1。
我想优化从此生成的查询到此:
示例:
id | name
1 | Bob
2 | Albert
3 | bernard
输出:
letter | id | count
A | 2 | 1
B | 1 | 2
第一个字母A有1个用户(Albert),字母B有2个用户(bernard和Bob);按字母顺序排列的第一个是伯纳德。
我有一个有效的查询。它返回所有字母(和“无字母”),第一个用户和计数。
SELECT formatted_letter, id, COUNT(1)
FROM (
SELECT
CASE WHEN name REGEXP '[A-Za-z].*'
THEN UPPER(SUBSTR(name, 1, 1))
ELSE '@'
END as formatted_letter, id, name
FROM `users`
... (some joins and conditions)
ORDER BY name
) AS A
GROUP BY formatted_letter
这完美地运行并返回正确的值......但是这个查询非常耗时(选择25 000个用户时为9秒)......
您是否有其他方法可以优化此查询?
我尝试过的事情:
所有索引都出现在用户ID,用户名以及联接和条件的所有索引上。
答案 0 :(得分:1)
这里有可能的想法: -
SELECT FirstLetter, MAX(name), SUM(NameCount)
FROM
(
SELECT substr(name, 1, 1) AS FirstLetter, MIN(name) AS name, COUNT(*) AS NameCount
FROM company
GROUP BY FirstLetter
UNION
SELECT 'A' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'B' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'C' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'D' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'E' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'F' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'G' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'H' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'I' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'J' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'K' AS FirstLetter, "" AS name, 0 AS NameCount
UNION
SELECT 'L' AS FirstLetter, "" AS name, 0 AS NameCount
) sub1
GROUP BY FirstLetter
(我很无聊地输入可能的字母来填补空白)。
这确实有效,但不确定表的大小与你的大小相当(在我随机的表/字段上花费不到一秒,大约有140k记录)。
编辑 - 好的再试一次。
您的基本查询归结为此(忽略填空): -
SELECT CASE WHEN name REGEXP '[A-Za-z].*' THEN UPPER(SUBSTR(name, 1, 1)) ELSE '@' END as formatted_letter, MIN(id) AS id, COUNT(*) AS NameCount
FROM users
GROUP BY formatted_letter
这本身应该非常有效。试一试,让我们知道需要多长时间。
如果这很快,那么添加零计数记录的工会应该增加一个标称时间。
在具有140k记录的随机表上尝试使用它需要大约1秒(并且名称字段甚至没有编入索引)。
添加联合选择不会为查询添加任何明显的时间: -
SELECT formatted_letter, MAX(name), SUM(NameCount)
FROM
(
SELECT CASE WHEN company REGEXP '[A-Za-z].*' THEN UPPER(SUBSTR(company, 1, 1)) ELSE '@' END as formatted_letter, MIN(id) AS id, COUNT(*) AS NameCount
FROM users
GROUP BY formatted_letter
UNION
SELECT 'A' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'B' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'C' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'D' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'E' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'F' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'G' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'H' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'I' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'J' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'K' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'L' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'M' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'N' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'O' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'P' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'Q' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'R' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'S' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'T' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'U' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'V' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'W' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'X' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'Y' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT 'Z' AS formatted_letter, "" AS id, 0 AS NameCount
UNION SELECT '@' AS formatted_letter, "" AS id, 0 AS NameCount
) Sub1
GROUP BY formatted_letter
如果你的机器需要36秒左右,那么就会发生一些奇怪的事情
答案 1 :(得分:0)
“无字母”是什么意思,如果暴露,则来自(其他连接/条件)也可以进行优化。在MINIMUM,你有没有名字......或者至少在第一个位置的名字?
另外,我会杀死内部的ORDER BY NAME子句,因为它对最终输出没有实际影响,无论如何你通过formatted_letter进行组操作...在外部查询中添加formatted_letter的顺序,因为那只会返回26 +'@'记录并且是即时的。