Question

我正在开发社交网络跟踪应用程序。即使连接正常索引也能正常工作。但是当我添加order by子句时，总查询执行的时间要长100倍。以下查询用于获取没有order by子句的twitter_users。

 #region FramingOutput
            for (int i = 0; i < dataset1.Tables[0].Rows.Count; i++)
            {
                //once we Read whole Row from the table. We need to check weather it is a parent or child.
                // parent component will not have parent id populated.
                //we need to check for every component weather it has the parent id same as previous component ID

                #region FirstRow
                //Assuming the first Row is always a Parent so Creating a parent Tag.
                if ( i == 0)
                {
                   ParentComponent = CreateParent(dataset1, i, out ParentID);  //method to create a parent tag
                   continue;
                }
                #endregion

                ischild = checkischild(dataset1, i);
                if(ischild == true)
                {
                   ChildComponent =  CreateChild(dataset1, i);  //method to create a child tag
                   ParentComponent.Add(ChildComponent);
                }
                else
                {
                    Components.Add(ParentComponent);
                    ParentComponent = CreateParent(dataset1, i, out ParentID);
                }


            }

显示0到19行（总共20行，查询占用0.0714秒）

但是当我添加order by子句时（在索引列上）

SELECT DISTINCT  `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN  `twitter_content` ON  `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id` 
INNER JOIN  `tracker_twitter_content` ON  `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id` 
AND  `tracker_twitter_content`.`tracker_id` =  '88'
LIMIT 20

显示0到19行（总计20行，查询占用13.4636秒）

EXPLAIN

当我单独在其表中实现order by子句时，它不会花费太多时间

SELECT DISTINCT  `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN  `twitter_content` ON  `tracked_twitter`.`id` =  `twitter_content`.`tracked_twitter_id` 
INNER JOIN  `tracker_twitter_content` ON  `twitter_content`.`id` =  `tracker_twitter_content`.`twitter_content_id` 
AND  `tracker_twitter_content`.`tracker_id` =  '88'
ORDER BY tracked_twitter.followers_count DESC 
LIMIT 20

显示0到19行（总计20行，查询耗时0.0711秒）[followers_count：68236387 - 10525612]

表创建查询如下

SELECT * FROM `tracked_twitter` WHERE 1 order by `followers_count` desc limit 20

因此，当我在其表上执行时，join并没有减慢查询和顺序。那么我该如何提高绩效呢？

更新1

@GordonLinoff方法解决了我是否只需要父表中的结果集。我想知道每人的推文数量（与tracked_twitter表匹配的twitter_content计数）。我怎么修改它？如果我想在推文内容上有数学函数，我该怎么办？

CREATE TABLE IF NOT EXISTS `tracked_twitter` (
    `id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `handle` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `location` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
    `description` text COLLATE utf8_unicode_ci,
    `profile_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `followers_count` int(11) NOT NULL,
    `is_influencer` tinyint(1) NOT NULL DEFAULT '0',
    `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    `gender` enum('Male','Female','Other') COLLATE utf8_unicode_ci 
     DEFAULT NULL,
     PRIMARY KEY (`id`),
     KEY `followers_count` (`followers_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Answer 1

尝试摆脱distinct。这是一个性能杀手。我不确定为什么你的第一个查询很快就能运行;也许MySQL非常聪明，可以将其优化掉。

我会尝试：

SELECT tt.id
FROM tracked_twitter tt
WHERE EXISTS (SELECT 1
              FROM twitter_content tc INNER JOIN  
                   tracker_twitter_content ttc
                   ON  tc.id =  ttc.twitter_content_id
              WHERE  ttc.tracker_id =  88 AND
                     tt.id =  tc.tracked_twitter_id
             )
ORDER BY tt.followers_count DESC ;

对于此版本，您需要索引：tracked_twitter(followers_count, id)，twitter_content(tracked_twitter_id, id)和 tracker_twitter_content(twitter_content_id, tracker_id)。

Answer 2

父表保持括号

SELECT DISTINCT  `tracked_twitter`.id FROM
(SELECT id,followers_count  FROM tracked_twitter ORDER BY followers_count DESC 
LIMIT 20) AS tracked_twitter
INNER JOIN  `twitter_content` ON  `tracked_twitter`.`id` =  `twitter_content`.`tracked_twitter_id` 
INNER JOIN  `tracker_twitter_content` ON  `twitter_content`.`id` =  `tracker_twitter_content`.`twitter_content_id` 
AND  `tracker_twitter_content`.`tracker_id` =  '88'
ORDER BY tracked_twitter.followers_count DESC

Answer 3

主要问题是，即使行数相对较少，也可以使用varchar(255) COLLATE utf8_unicode_ci作为主键（而不是整数），因此将其作为其他表中的外键。我怀疑，同样的问题是twitter_content.id。这会导致很多长字符串比较，并为临时表保留大量额外内存。

关于查询本身，是的，它应该是一个遍历followers_count索引的查询，并检查相关表的条件。这可以像Gordon Linoff建议的那样，或者通过使用索引提示来完成。

如何通过mysql中的连接来提高性能顺序

3 个答案: