双左连接查询所需的优化

时间:2015-11-14 15:36:06

标签: php mysql join

我一直在努力使用mysql连接,但是尽管已经阅读了数十篇教程和mysql手册,但我们已经开始整合更多但很难理解。

我的情况是我有3张桌子:

/ *基本上是一个描述风扇记录的表* /

    CREATE TABLE `fans` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `first_name` varchar(255) DEFAULT NULL,
      `middle_name` varchar(255) DEFAULT NULL,
      `last_name` varchar(255) DEFAULT NULL,
      `email` varchar(255) DEFAULT NULL,
      `join_date` datetime DEFAULT NULL,
      `twitter` varchar(255) DEFAULT NULL,
      `twitterCrawled` datetime DEFAULT NULL,
      `twitterImage` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `email` (`email`)
    ) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1;

    /* A TABLE OF OUR TWITTER FOLLOWERS */

    CREATE TABLE `twitterFollowers` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `screenName` varchar(25) DEFAULT NULL,
      `twitterId` varchar(25) DEFAULT NULL,
      `customerId` int(11) DEFAULT NULL,
      `uniqueStr` varchar(50) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `unique` (`uniqueStr`)
    ) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8;

    /* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS 
    IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */

    CREATE TABLE `contentSuggestion` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `userId` int(11) DEFAULT NULL,
      `fanId` int(11) DEFAULT NULL,
      `twitterAccountId` int(11) DEFAULT NULL,
      `contentType` varchar(50) DEFAULT NULL,
      `contentString` varchar(255) DEFAULT NULL,
      `added` datetime DEFAULT NULL,
      `dismissed` datetime DEFAULT NULL,
      `uniqueStr` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `unstr` (`uniqueStr`)
    ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

我想要的是:

SELECT [粉丝专栏] 粉丝屏幕名称是在Twitter追随者 AND WHERE粉丝屏幕名称不在contentSuggestion(已取消日期戳)

My attempts so far:
  

~33秒

     

SELECT fans.id,tf.screenName as col1,tf.twitterId as col2 FROM fans       LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername       LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL       GROUP BY(fans.id)有col1!=''

     

~14秒

     

SELECT id,emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName)FROM twitterFollowers)AND emailUsername NOT IN(SELECT DISTINCT(contentString)FROM contentSuggestion WHERE dismissed IS NULL)GROUP BY(fans.id);

     

9.53秒

     

SELECT fans.id,tf.screenName as col1,tf.twitterId as col2 FROM fans   LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)

我希望有更好的方法。我一直在努力在单个LEFT JOIN之外真正使用JOINS,这已经帮助我大大加快了其他查询的速度。

感谢您提供任何帮助。

1 个答案:

答案 0 :(得分:0)

我会选择第二种方法的变体。而不是IN,请使用EXISTS。然后添加正确的索引并删除聚合:

SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
              FROM twitterFollowers tf
              WHERE f.emailUsername = tf.screenName
             ) AND
      NOT EXISTS (SELECT 1
                  FROM contentSuggestion cs
                  WHERE f.emailUsername = cs.contentString AND
                        cs.dismissed IS NULL
                 ) ;

然后确保您拥有以下索引:twitterFollowers(screenName)contentSuggestion(contentString, dismissed)

一些注意事项:

  • 使用IN时,请勿使用SELECT DISTINCT。我并不是100%确定MySQL总是足够智能忽略子查询中的DISTINCT(它是多余的)。
  • 从历史上看,EXISTS比MySQL中的IN快。优化器在最近的版本中有所改进。
  • 为了提高性能,您需要正确的索引。 然后确保您拥有以下索引:twitterFollowers(screenName)contentSuggestion(contentString, dismissed)
  • 假设fan.id是唯一的(一个非常合理的假设),您不需要最终的group by