如何利用LEFT OUTER JOIN上的索引

时间:2015-04-09 16:47:29

标签: mysql sql database join

以下是一组描述音乐作曲家的表格:

CREATE TABLE IF NOT EXISTS `compositors` (
`id` int(11) NOT NULL,
  `name` varchar(45) NOT NULL COMMENT 'Nom et Prenom',
  `birth_date` varchar(45) DEFAULT NULL,
  `death_date` varchar(45) DEFAULT NULL,
  `birth_place` varchar(45) DEFAULT NULL,
  `death_place` varchar(45) DEFAULT NULL,
  `gender` enum('M','F') DEFAULT NULL,
  `century` varchar(45) DEFAULT NULL,
  `country` int(11) DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=28741 DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `compositor_biography` (
`index` int(11) NOT NULL,
  `compositor_id` int(11) NOT NULL,
  `url` varchar(255) DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=15325 DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `compositor_comments` (
  `compositor_id` int(11) NOT NULL,
  `comment` text NOT NULL,
  `public` enum('Publique','Privé') NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `compositor_country` (
  `compositor_id` int(11) NOT NULL,
  `country_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

以下是我的索引:

--
-- Index pour la table `compositors`
--
ALTER TABLE `compositors` ADD PRIMARY KEY (`id`), ADD KEY `countries` (`country`);
ALTER TABLE `compositor_biography` ADD PRIMARY KEY (`index`), ADD KEY `index` (`compositor_id`);
ALTER TABLE `compositor_comments` ADD KEY `c_compositor_idx` (`compositor_id`);

最后样本数据:

INSERT INTO `compositors` (`id`, `name`, `birth_date`, `death_date`, `birth_place`, `death_place`, `gender`, `century`, `country`) VALUES
(1, 'Dummy Compositor', '1606', '1676', 'Bruxellesss', NULL, 'F', '17', 11);

INSERT INTO `compositor_biography` (`index`, `compositor_id`, `url`) VALUES
(15322, 1, 'Dummy Link 1'),
(15323, 1, 'Dummy Link 2'),
(15324, 1, 'Dummy Link 3');

INSERT INTO `compositor_comments` (`compositor_id`, `comment`, `public`) VALUES
(1, 'Dummy Comment', 'Privé');

以下是我的PHP脚本生成的示例查询:

SELECT DISTINCT compositors.id, compositors.name, compositors.birth_date, compositors.death_date, compositors.birth_place, compositors.death_place, compositors.gender, compositors.century, compositors.country,  
GROUP_CONCAT( compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url, 
GROUP_CONCAT( compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment, 
GROUP_CONCAT( compositor_comments.public + 0 SEPARATOR ';') AS concat_compositor_comments_public 
FROM compositors 
LEFT JOIN compositor_biography ON compositors.id = compositor_biography.compositor_id 
LEFT JOIN compositor_comments ON compositors.id = compositor_comments.compositor_id 
GROUP BY compositors.id

但是,这个问题有一个问题,如果您执行此查询,您可以在concat_compositor_comments_comment列中看到,您有这样的结果:

Dummy Comment;Dummy Comment;Dummy Comment

但只有一个实际评论。

我真的不明白那里有什么问题,但它似乎是GROUP BY。每个JOIN应该有一个GROUP BY - 根据Multiple GROUP_CONCAT on different fields using MySQL的第二个答案 - 所以我做了它,并且它有效,这个查询:

SELECT DISTINCT compositors.id,
    compositors.NAME,
    compositors.birth_date,
    compositors.death_date,
    compositors.birth_place,
    compositors.death_place,
    compositors.gender,
    compositors.century,
    compositors.country,
    concat_compositor_biography_url,
    concat_compositor_comments_comment,
    concat_compositor_comments_public
FROM compositors
LEFT JOIN (
    SELECT compositor_id,
        GROUP_CONCAT(compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url
    FROM compositor_biography
    GROUP BY compositor_biography.compositor_id
    ) compositor_biography ON compositors.id = compositor_biography.compositor_id
LEFT JOIN (
    SELECT compositor_id,
        GROUP_CONCAT(compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment,
        GROUP_CONCAT(compositor_comments.PUBLIC + 0 SEPARATOR ';') AS concat_compositor_comments_public
    FROM compositor_comments
    GROUP BY compositor_comments.compositor_id
    ) compositor_comments ON compositors.id = compositor_comments.compositor_id

然而,这个查询存在巨大的性能问题,因为它没有使用INDEXES,或者至少它似乎扫描所有表格,并且对于24000个作曲家,该查询需要大约420秒,而另一个(在GROUP BY上给出错误的结果)需要1秒钟。

如何更改第二个查询,以便正确使用索引并且不扫描所有表?

以下是SQL-Fiddle数据库架构的链接:http://sqlfiddle.com/#!2/6b0132


更新

根据@phil_w,经过进一步测试,此查询似乎具有非常好的性能:

SELECT a.id,
    a.name,
    a.concat_compositor_biography_url,
    b.concat_compositor_aliases_data,
    GROUP_CONCAT(compositor_comments.comment SEPARATOR ';') as concat_compositor_comments_comment,
    GROUP_CONCAT(compositor_comments.public + 0 SEPARATOR ';') as concat_compositor_comments_public
FROM (
    SELECT b.id,
    b.name,
    b.concat_compositor_biography_url,
    GROUP_CONCAT(compositor_aliases.data SEPARATOR ';') as concat_compositor_aliases_data
    FROM (
        SELECT compositors.id,
            compositors.name,
            GROUP_CONCAT(compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url
        FROM compositors
        LEFT JOIN compositor_biography ON compositors.id = compositor_biography.compositor_id
        GROUP BY compositors.id
    ) b
    LEFT JOIN compositor_aliases ON b.id = compositor_aliases.compositor_id
    GROUP BY b.id
) a
LEFT JOIN compositor_comments ON a.id = compositor_comments.compositor_id
GROUP BY a.id  

但是,如何在更紧凑的查询中获得相同的结果? (顺便说一句,我是否应该为此创建一个新问题并使其解决?)

2 个答案:

答案 0 :(得分:1)

这个问题与“索引”无关。问题是你有两个连接,并且将返回每个行的组合(即,在另一个连接中有3个匹配的行到compositor_biography)。

修复很简单 - 只需将DISTINCT添加到GROUP_CONCAT()函数:

...
GROUP_CONCAT( DISTINCT compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment, 
...

答案 1 :(得分:0)

你输入3次是正常的,因为你在compositor_biography中有3行......

或许你可以一步一步走, 首先只收集生物:

  SELECT compositors.id, compositors.name, 
    GROUP_CONCAT( compositor_biography.url SEPARATOR ';') AS   concat_compositor_biography_url
    FROM compositors 
    LEFT JOIN compositor_biography ON compositors.id =   compositor_biography.compositor_id  
    GROUP BY compositors.id

然后加入其余的

select t.id, t.name,t.concat_compositor_biography_url,
GROUP_CONCAT( compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment
from (
  SELECT compositors.id, compositors.name, 
  GROUP_CONCAT( compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url
  FROM compositors 
  LEFT JOIN compositor_biography ON compositors.id = compositor_biography.compositor_id 
  GROUP BY compositors.id
) t
LEFT JOIN compositor_comments ON t.id = compositor_comments.compositor_id  

依旧......

除非桌子很小,否则我不明白为什么它不会使用索引。 尝试解释选择......'确认一下。