改进大表中的mysql查询

时间:2016-10-20 21:28:26

标签: php mysql mysql-error-1064

我有一个查询来获取当前时间和INTERVAL 15分钟之间的数据

表格调用包含39790720项;

SELECT src,unique,dstchannel,chan,calldate 
FROM calls 
WHERE calldate BETWEEN (NOW() - INTERVAL 15 MINUTE) AND NOW() 
   AND (dstchannel LIKE '%TEXT1/%' 
        OR dstchannel LIKE '%TEXT2%' 
        OR dstchannel LIKE '%TEXT3%' 
        OR dstchannel REGEXP '^SIP/[[:digit:]]{10}-' 
        OR dstchannel LIKE '%TEXT4%' 
        OR dstchannel LIKE '%TEXT5%' 
        OR dstchannel LIKE '%TEXT6%' 
        OR dstchannel LIKE '%TEXT7%'
   ) 
   AND lastdata NOT LIKE '%TEXT8%' 
LIMIT 39780720,39790720


Query 1 row in set (1 min 7.38 sec)

    +-------------+--------------+------+-----+---------------------+-------+
    | Field       | Type         | Null | Key | Default             | Extra |
    +-------------+--------------+------+-----+---------------------+-------+
    | calldate    | datetime     | NO   |     | 0000-00-00 00:00:00 |       |
    | colum1      | varchar(80)  | NO   |     |                     |       |
    | colum11     | varchar(80)  | NO   |     |                     |       |
    | src         | varchar(80)  | NO   |     |                     |       |
    | colum12     | varchar(80)  | NO   |     |                     |       |
    | chan        | varchar(80)  | NO   |     |                     |       |
    | dstchannel  | varchar(80)  | NO   |     |                     |       |
    | colum2      | varchar(80)  | NO   |     |                     |       |
    | colum3      | varchar(80)  | NO   |     |                     |       |
    | colum4      | int(11)      | NO   |     | 0                   |       |
    | colum5      | int(11)      | NO   |     | 0                   |       |
    | colum6      | varchar(45)  | NO   |     |                     |       |
    | colum7      | int(11)      | NO   |     | 0                   |       |
    | colum8      | varchar(20)  | NO   |     |                     |       |
    | colum9      | varchar(32)  | NO   |     |                     |       |
    | colum10      | varchar(255) | NO   |     |                     |       |
    +-------------+--------------+------+-----+---------------------+-------+

如何改进查询?

更新

+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | calls   | ALL  | NULL          | NULL | NULL    | NULL | 39791545 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+

1 个答案:

答案 0 :(得分:0)

神圣的极端查询悲观,蝙蝠侠!

您的查询如下所示:

SELECT src,unique,dstchannel,chan,calldate
  from calls
 WHERE calldate BETWEEN (NOW() - INTERVAL 15 MINUTE) AND NOW()
   AND (   dstchannel LIKE '%TEXT1/%'
        OR dstchannel LIKE '%TEXT2%'
        OR dstchannel LIKE '%TEXT3%'
        OR dstchannel REGEXP '^SIP/[[:digit:]]{10}-'
        OR dstchannel LIKE '%TEXT4%'
        OR dstchannel LIKE '%TEXT5%'
        OR dstchannel LIKE '%TEXT6%'
        OR dstchannel LIKE '%TEXT7%') 
   AND lastdata NOT LIKE '%TEXT8%' 
 LIMIT 39780720,39790720

您可以通过在calldate上添加索引来略微改进此查询。你的calldate BETWEEN (NOW() - INTERVAL 15 MINUTE) AND NOW()条款会有所改善。

但是你的结构化方式永远不会那么快。为什么不呢?

  1. dstchannel LIKE '%TEXT2%'和类似的条款不能永远,利用索引。为什么不?因为他们必须在整个列中搜索字符串,并且不能只查看列的第一个字符。请注意,dstchannel LIKE 'TEXT2%' 可以利用随机访问的索引。它是一个锚定搜索,从列的开头开始。
  2. lastdata NOT LIKE '%TEXT8%'有同样的问题。但即使它是lastdata NOT LIKE TEXT8%`它也会导致问题,因为每一行都需要检查。服务器无法弄清楚如何访问一系列数据。
  3. OR条款是一场灾难。它们经常导致MySQL多次扫描相同的数据。
  4. LIMIT 39780720,39790720迫使MySQL在其结果集中掠过它们几乎四十个megarows。这会烧掉MySQL服务器内存,处理器时间和磁盘IO,只是为了丢弃它。你能以某种方式巧妙地使用ORDER BY子句,这样你就可以检索结果集的第一行而不是跳过它们吗?
  5. 你能做些什么来解决这个问题?您最好的办法是重新考虑整个LIKE '%something%'业务。

    如果你不能这样做,也许你可以尝试重铸你的查询。我假设你的calls表上有一个主键。我将其称为id

    SELECT a.src, a.unique, a.dstchannel, a.chan, a.calldate
      FROM a.calls
      JOIN (
              SELECT id FROM calls
               WHERE calldate BETWEEN (NOW() - INTERVAL 15 MINUTE) AND NOW()
                 AND dstchannel LIKE '%TEXT1/%'
               UNION
              SELECT id FROM calls
               WHERE calldate BETWEEN (NOW() - INTERVAL 15 MINUTE) AND NOW()
                 AND dstchannel LIKE '%TEXT2/%'
               UNION
              SELECT id FROM calls
               WHERE calldate BETWEEN (NOW() - INTERVAL 15 MINUTE) AND NOW()
                 AND dstchannel LIKE '%TEXT3/%'
               UNION
                      etcetera.
               UNION
              SELECT id FROM calls
               WHERE calldate BETWEEN (NOW() - INTERVAL 15 MINUTE) AND NOW()
                 AND dstchannel REGEXP '^SIP/[[:digit:]]{10}-'
               UNION 
                      etcetera.
           ) b ON a.id = b.id
     WHERE lastdata NOT LIKE '%TEXT8%' 
    

    然后,在列(calldate, dstchannel, id)上为您的表创建复合索引。然后,MySQL查询计划程序可以使用该索引查找适当的calldate范围,然后扫描存储在索引中的dstchannel值以进行匹配,然后提取id值。然后,它会转而使用JOIN中的id值来精确地从主表中获取所需的数据。

    如果您正在处理呼叫详细记录,则确实需要了解索引。阅读:http://use-the-index-luke.com/