优化组合排序和组合的mySQL查询

时间:2013-10-02 12:29:30

标签: mysql optimization query-optimization

我在使用mysql 5.1。

我想优化从此生成的查询到此:

  • 条目:具有ID和名称(10万个条目)的“用户”表
  • 输出:每个字母的第一个user_ids及其计数

示例:

id | name
1  | Bob
2  | Albert
3  | bernard

输出:

letter | id | count
     A | 2  | 1
     B | 1  | 2

第一个字母A有1个用户(Albert),字母B有2个用户(bernard和Bob);按字母顺序排列的第一个是伯纳德。

我有一个有效的查询。它返回所有字母(和“无字母”),第一个用户和计数。

SELECT formatted_letter, id, COUNT(1)
FROM (
  SELECT
    CASE WHEN name REGEXP '[A-Za-z].*'
           THEN UPPER(SUBSTR(name, 1, 1))
         ELSE '@'
    END as formatted_letter, id, name
  FROM `users`
    ... (some joins and conditions)
  ORDER BY name
) AS A
GROUP BY formatted_letter

这完美地运行并返回正确的值......但是这个查询非常耗时(选择25 000个用户时为9秒)......

您是否有其他方法可以优化此查询?

我尝试过的事情:

  • 为每个字母组成一个大联盟,这是最糟糕的(36秒)。
  • 添加一列'formatted_letter'来删除CASE / WHEN部分,它不错,现在需要8秒。

所有索引都出现在用户ID,用户名以及联接和条件的所有索引上。

2 个答案:

答案 0 :(得分:1)

这里有可能的想法: -

SELECT FirstLetter, MAX(name), SUM(NameCount)
FROM
(
    SELECT substr(name, 1, 1) AS FirstLetter, MIN(name) AS name, COUNT(*) AS NameCount
    FROM company
    GROUP BY FirstLetter
    UNION
    SELECT 'A' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'B' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'C' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'D' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'E' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'F' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'G' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'H' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'I' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'J' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'K' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'L' AS FirstLetter, "" AS name, 0 AS NameCount
) sub1
GROUP BY FirstLetter

(我很无聊地输入可能的字母来填补空白)。

这确实有效,但不确定表的大小与你的大小相当(在我随机的表/字段上花费不到一秒,大约有140k记录)。

编辑 - 好的再试一次。

您的基本查询归结为此(忽略填空): -

SELECT CASE WHEN name REGEXP '[A-Za-z].*' THEN UPPER(SUBSTR(name, 1, 1)) ELSE '@' END as formatted_letter, MIN(id) AS id, COUNT(*) AS NameCount
FROM users
GROUP BY formatted_letter

这本身应该非常有效。试一试,让我们知道需要多长时间。

如果这很快,那么添加零计数记录的工会应该增加一个标称时间。

在具有140k记录的随机表上尝试使用它需要大约1秒(并且名称字段甚至没有编入索引)。

添加联合选择不会为查询添加任何明显的时间: -

SELECT formatted_letter, MAX(name), SUM(NameCount)
FROM
(
    SELECT CASE WHEN company REGEXP '[A-Za-z].*' THEN UPPER(SUBSTR(company, 1, 1)) ELSE '@' END as formatted_letter, MIN(id) AS id, COUNT(*) AS NameCount
    FROM users
    GROUP BY formatted_letter
    UNION
    SELECT 'A' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'B' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'C' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'D' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'E' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'F' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'G' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'H' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'I' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'J' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'K' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'L' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'M' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'N' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'O' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'P' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'Q' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'R' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'S' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'T' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'U' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'V' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'W' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'X' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'Y' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'Z' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT '@' AS formatted_letter, "" AS id, 0 AS NameCount
) Sub1
GROUP BY formatted_letter

如果你的机器需要36秒左右,那么就会发生一些奇怪的事情

答案 1 :(得分:0)

“无字母”是什么意思,如果暴露,则来自(其他连接/条件)也可以进行优化。在MINIMUM,你有没有名字......或者至少在第一个位置的名字?

另外,我会杀死内部的ORDER BY NAME子句,因为它对最终输出没有实际影响,无论如何你通过formatted_letter进行组操作...在外部查询中添加formatted_letter的顺序,因为那只会返回26 +'@'记录并且是即时的。