我有这个问题:
SELECT DISTINCT
t1.`signature_id` AS id1,
t2.`signature_id` AS id2,
COUNT(DISTINCT t3.serial) AS weight
FROM `gc_con_sig` AS t1
INNER JOIN `gc_con_sig` AS t2
ON ((t1.`signature_id` != t2.`signature_id`)
AND (t1.`petition_id` = t2.`petition_id`))
INNER JOIN `wtp_data_petitions` AS t3
ON (t3.`serial` = t1.`petition_serial`)
GROUP BY t1.`signature_id`, t2.`signature_id`
HAVING weight > 0;
它基本上得到了signature_ids的排列,以及他们签署的请愿数量(权重)。
我正在尝试针对此表运行(gc_con_sig):
`petition_id` varchar(64) NOT NULL DEFAULT '' COMMENT 'Petition ID defined by API',
`signature_id` varchar(34) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`petition_serial` int(11) DEFAULT NULL,
KEY `signature_id` (`signature_id`),
KEY `petition_id` (`petition_id`),
KEY `signature_petition_idx` (`signature_id`,`petition_id`),
KEY `pcidx` (`petition_id`,`signature_id`),
KEY `sig_pet_ser_idx` (`petition_serial`)
这是我得到的解释:
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | t1 | ALL | petition_id,pcidx,sig_pet_ser_idx | NULL | NULL | NULL | 200659 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY | PRIMARY | 4 | wtp.t1.petition_serial | 1 | Using index |
| 1 | SIMPLE | t2 | ref | petition_id,pcidx | pcidx | 194 | wtp.t1.petition_id | 5016 | Using where; Using index |
+----+-------------+-------+--------+--------------------------------------------------------+---------+---------+------------------------+--------+----------------------------------------------+
我已经优化了mysqltuner告诉我的各种mysql配置,但是这个查询在17GB内存(12GB分配给mysql)的机器上运行(至少在一小时内)。
有什么想法吗?
答案 0 :(得分:1)
签名可以在多个请愿书上吗? serial
可以NULL
吗?
假设两个问题的答案都是“否”,您可以尝试:
SELECT t1.`signature_id` AS id1, t2.`signature_id` AS id2,
COUNT(*) AS weight
FROM `gc_con_sig` t1 INNER JOIN
`gc_con_sig` t2
ON (t1.`signature_id` != t2.`signature_id`) AND
(t1.`petition_id` = t2.`petition_id`)
GROUP BY t1.`signature_id`, t2.`signature_id`;
count(distinct serial)
正在计算字段中的非NULL值。如果所有值都不是NULL
并且没有重复项,那么这相当于count(*)
。
不需要having子句,因为on
子句基本上保证至少有一个匹配。
最后,当您正确使用select distinct
时,永远不需要group by
。