Question

我有一个相对复杂的查询，这里是小提琴：http://sqlfiddle.com/#!2/65c66/12/0

SELECT p.title AS title_1,
       p2.title AS title_2,
       COUNT(DISTINCT s.signature_id) AS num_signers,
       group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s
GROUP BY s.signature_id
HAVING sum(s.petition_id=p.id)
AND sum(s.petition_id=p2.id);

这是EXPLAIN（显示我在真实数据集中的行数，而不是sqlfiddle）：

+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+
| id | select_type | table | type  | possible_keys | key          | key_len | ref  | rows     | Extra                           |
+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+
|  1 | SIMPLE      | p     | ALL   | PRIMARY       | NULL         | NULL    | NULL |     1727 | Using temporary; Using filesort |
|  1 | SIMPLE      | p2    | ALL   | PRIMARY       | NULL         | NULL    | NULL |     1727 | Using where; Using join buffer  |
|  1 | SIMPLE      | s     | index | NULL          | signature_id | 105     | NULL | 12943894 | Using index; Using join buffer  |
+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+

此时，查询使用了很多磁盘空间和filesort，我还没有看到它在错误输出之前成功完成。我是否可以进行任何优化以更快或更有效地实现这一目标？

谢谢！

Answer 1

是。您可以做的一件事是将连接条件移动到on子句：

SELECT p.title AS title_1,
       p2.title AS title_2,
       COUNT(DISTINCT s.signature_id) AS num_signers,
       group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s on s.petition_id=p.id or s.petition_id=p2.id
GROUP BY s.signature_id;

我还认为group by应该在p.title, p2.title上：

SELECT p.title AS title_1,
       p2.title AS title_2,
       COUNT(DISTINCT s.signature_id) AS num_signers,
       group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s on s.petition_id=p.id or s.petition_id=p2.id
GROUP BY p.title, p2.title;

但是，为什么要进行第二次加入？我不确定查询应该做什么。

编辑：

我认为您想要的基本查询是：

select s1.petition_id, s2.petition_id, count(*) as numsignatures, 
       group_concat(s1.signature_id) as signatures  
from wtp_data_signatures s1 join
     wtp.data_signatures s2
     on s1.signature_id = s2.signature_id and
        s1.petition_id < s2.petition_id
group by s1.petition_id, s2.petition_id;

您现在可以对此进行扩展，以包括请愿信息：

select p1.title as title_1, p2.title as title_2,
       s1.petition_id, s2.petition_id, count(*) as numsignatures, 
       group_concat(s1.signature_id) as signatures  
from wtp_data_signatures s1 join
     wtp.data_signatures s2
     on s1.signature_id = s2.signature_id and
        s1.petition_id < s2.petition_id join
     wtp_data_petitions p1
     on p1.id = s1.petition_id join
     wtp_data_petitions p2
     ON p2.id = s2.petition_id 
group by s1.petition_id, s2.petition_id;

Answer 2

你有连续索引吗？在p.serial上自我加入＆gt; p2.serial看起来是它需要对wtp_data_petitions进行排序的唯一原因。尝试添加索引。

优化查询以不使用filesort

2 个答案: