使用filesort优化自联接

时间:2013-12-15 20:06:07

标签: mysql sql optimization query-optimization

我有这个问题:

SELECT DISTINCT 
       t1.`signature_id` AS id1, 
       t2.`signature_id` AS id2, 
       COUNT(DISTINCT t3.serial) AS weight 
FROM `gc_con_sig` AS t1 
INNER JOIN `gc_con_sig` AS t2 
        ON ((t1.`signature_id` != t2.`signature_id`) 
            AND (t1.`petition_id` = t2.`petition_id`))
INNER JOIN `wtp_data_petitions` AS t3 
        ON (t3.`serial` = t1.`petition_serial`)
GROUP BY t1.`signature_id`, t2.`signature_id`
HAVING weight > 0;

它基本上得到了signature_ids的排列,以及他们签署的请愿数量(权重)。

我正在尝试针对此表运行(gc_con_sig):

`petition_id` varchar(64) NOT NULL DEFAULT '' COMMENT 'Petition ID defined by API',
  `signature_id` varchar(34) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `petition_serial` int(11) DEFAULT NULL,
  KEY `signature_id` (`signature_id`),
  KEY `petition_id` (`petition_id`),
  KEY `signature_petition_idx` (`signature_id`,`petition_id`),
  KEY `pcidx` (`petition_id`,`signature_id`),
  KEY `sig_pet_ser_idx` (`petition_serial`)

这是我得到的解释:

  +----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
  | id | select_type | table | type   | possible_keys                                          | key     | key_len | ref                    | rows   | Extra                                        |
  +----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
  |  1 | SIMPLE      | t1    | ALL    | petition_id,pcidx,sig_pet_ser_idx                      | NULL    | NULL    | NULL                   | 200659 | Using where; Using temporary; Using filesort |
  |  1 | SIMPLE      | t3    | eq_ref | PRIMARY                                                | PRIMARY | 4       | wtp.t1.petition_serial |      1 | Using index                                  |
  |  1 | SIMPLE      | t2    | ref    | petition_id,pcidx                                      | pcidx   | 194     | wtp.t1.petition_id     |   5016 | Using where; Using index                     |
  +----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+

我已经优化了mysqltuner告诉我的各种mysql配置,但是这个查询在17GB内存(12GB分配给mysql)的机器上运行(至少在一小时内)。

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

签名可以在多个请愿书上吗? serial可以NULL吗?

假设两个问题的答案都是“否”,您可以尝试:

SELECT t1.`signature_id` AS id1, t2.`signature_id` AS id2,
       COUNT(*) AS weight 
FROM `gc_con_sig` t1 INNER JOIN
     `gc_con_sig` t2
     ON (t1.`signature_id` != t2.`signature_id`) AND
        (t1.`petition_id` = t2.`petition_id`)
GROUP BY t1.`signature_id`, t2.`signature_id`;

count(distinct serial)正在计算字段中的非NULL值。如果所有值都不是NULL并且没有重复项,那么这相当于count(*)

不需要having子句,因为on子句基本上保证至少有一个匹配。

最后,当您正确使用select distinct时,永远不需要group by