我有一个相对复杂的查询,这里是小提琴:http://sqlfiddle.com/#!2/65c66/12/0
SELECT p.title AS title_1,
p2.title AS title_2,
COUNT(DISTINCT s.signature_id) AS num_signers,
group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s
GROUP BY s.signature_id
HAVING sum(s.petition_id=p.id)
AND sum(s.petition_id=p2.id);
这是EXPLAIN(显示我在真实数据集中的行数,而不是sqlfiddle):
+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+
| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 1727 | Using temporary; Using filesort |
| 1 | SIMPLE | p2 | ALL | PRIMARY | NULL | NULL | NULL | 1727 | Using where; Using join buffer |
| 1 | SIMPLE | s | index | NULL | signature_id | 105 | NULL | 12943894 | Using index; Using join buffer |
+----+-------------+-------+-------+---------------+--------------+---------+------+----------+---------------------------------+
此时,查询使用了很多磁盘空间和filesort,我还没有看到它在错误输出之前成功完成。我是否可以进行任何优化以更快或更有效地实现这一目标?
谢谢!
答案 0 :(得分:1)
是。您可以做的一件事是将连接条件移动到on
子句:
SELECT p.title AS title_1,
p2.title AS title_2,
COUNT(DISTINCT s.signature_id) AS num_signers,
group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s on s.petition_id=p.id or s.petition_id=p2.id
GROUP BY s.signature_id;
我还认为group by
应该在p.title, p2.title
上:
SELECT p.title AS title_1,
p2.title AS title_2,
COUNT(DISTINCT s.signature_id) AS num_signers,
group_concat(DISTINCT s.signature_id separator ' ') AS signers
FROM wtp_data_petitions p
JOIN wtp_data_petitions p2 ON (p.serial > p2.serial)
JOIN wtp_data_signatures s on s.petition_id=p.id or s.petition_id=p2.id
GROUP BY p.title, p2.title;
但是,为什么要进行第二次加入?我不确定查询应该做什么。
编辑:
我认为您想要的基本查询是:
select s1.petition_id, s2.petition_id, count(*) as numsignatures,
group_concat(s1.signature_id) as signatures
from wtp_data_signatures s1 join
wtp.data_signatures s2
on s1.signature_id = s2.signature_id and
s1.petition_id < s2.petition_id
group by s1.petition_id, s2.petition_id;
您现在可以对此进行扩展,以包括请愿信息:
select p1.title as title_1, p2.title as title_2,
s1.petition_id, s2.petition_id, count(*) as numsignatures,
group_concat(s1.signature_id) as signatures
from wtp_data_signatures s1 join
wtp.data_signatures s2
on s1.signature_id = s2.signature_id and
s1.petition_id < s2.petition_id join
wtp_data_petitions p1
on p1.id = s1.petition_id join
wtp_data_petitions p2
ON p2.id = s2.petition_id
group by s1.petition_id, s2.petition_id;
答案 1 :(得分:0)
你有连续索引吗?在p.serial上自我加入&gt; p2.serial看起来是它需要对wtp_data_petitions进行排序的唯一原因。尝试添加索引。