Question

我们使用Ahoy ruby库来跟踪用户访问和事件。为了向用户提供反馈，我们会定期对某些事件和访问进行计数。

这两张桌子比较大，但不是很大。访问量为6MM +行，事件为23MM +行。

以下是一个示例查询，需要80秒才能运行：

SELECT COUNT(*) 
FROM `ahoy_events` 
INNER JOIN `visits`  ON `visits`.`id` = `ahoy_events`.`visit_id` 
WHERE `ahoy_events`.`event_target_id` = 8471 
  AND `ahoy_events`.`event_target_type` = 'Project' 
  AND visits.entity_id = 668 
  AND (`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'User')

以下是该查询的解释：

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: visits
   partitions: NULL
         type: ref
possible_keys: PRIMARY,index_visits_on_entity_id,index_visits_on_entity_id_and_user_type,index_visits_on_entity_id_and_started_at,index_visits_on_entity_id_and_user_id_and_user_type,index_visits_on_entity_id_user_id_user_type_started_at
          key: index_visits_on_entity_id_user_id_user_type_started_at
      key_len: 5
          ref: const
         rows: 1567140
     filtered: 19.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: ahoy_events
   partitions: NULL
         type: ref
possible_keys: index_ahoy_events_on_visit_id,index_ahoy_events_on_event_target_id_and_event_target_type
          key: index_ahoy_events_on_visit_id
      key_len: 17
          ref: givecorpssite.visits.id
         rows: 2
     filtered: 11.47
        Extra: Using where

当我对各个表进行计数时，每个表都运行200ms到600ms，即：

SELECT count(*) FROM `ahoy_events` WHERE `ahoy_events`.`event_target_id` = 8471 AND `ahoy_events`.`event_target_type` = 'Project'

和

SELECT count(*) FROM `visits` where visits.entity_id = 668 AND (`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'Donor')

但是在主/外键上加入它们会导致查询占用80s +

BTW，密钥（visit_id和访问时的id列）是UUID，是BINARY（16）列。

我错误地认为这个查询不应该这么慢吗？

Answer 1

由于尚不清楚OR选择条件是否导致问题，并且您实际上并未在结果中寻找行级数据，因此您可以尝试这种类型的条件聚合：

SELECT COUNT(IF(`visits`.`user_type` IS NULL OR `visits`.`user_type` = 'User',1,NULL) 
FROM `ahoy_events` 
INNER JOIN `visits`  ON `visits`.`id` = `ahoy_events`.`visit_id` 
WHERE `ahoy_events`.`event_target_id` = 8471 
  AND `ahoy_events`.`event_target_type` = 'Project' 
  AND visits.entity_id = 668 
;

COUNT忽略空值;或者， SUM(IF(visits.user_type IS NULL OR visits.user_type = 'User',1,0))更清晰一些，并得到相同的结果（虽然从理论上说它在性能方面可能会更昂贵）。

在这个查询中，你将处理更多的行，而不会减少它们的条件，但是扫描较大的结果可能会“更便宜”，而不是扫描表格以获得更小的结果。

Answer 2

覆盖指数：

visits:  INDEX(entity_id, user_type, id)  -- in this order
ahoy_events:  INDEX(event_target_id, event_target_type, visit_id)

通过覆盖，可能会减少I / O. （I / O是查询中最慢的部分。）

以下情况有可能会更快：

SELECT  
    (
        SELECT  COUNT(*)
            FROM  `ahoy_events` AS e
            INNER JOIN  `visits` AS v  ON v.`id` = e.`visit_id`
            WHERE  e.`event_target_id` = 8471
              AND  e.`event_target_type` = 'Project'
              AND  visits.entity_id = 668
              AND  v.`user_type` IS NULL 
    ) + 
    (
        SELECT  COUNT(*)
            FROM  `ahoy_events` AS e
            INNER JOIN  `visits` AS v  ON v.`id` = e.`visit_id`
            WHERE  e.`event_target_id` = 8471
              AND  e.`event_target_type` = 'Project'
              AND  visits.entity_id = 668
              AND  v.`user_type` = 'User' 
    );

它需要我上面建议的相同索引。

这里的理由是避免使用OR。（索引通常不能与OR一起使用。）

如果您想进一步讨论，请提供SHOW CREATE TABLE和EXPLAIN SELECT ...

在主/外键上连接两个表时，会大大减慢MySql查询速度

2 个答案: