使用JOIN使用MySQL COUNT(1),COUNT(2)...等

时间:2013-04-12 00:41:58

标签: mysql join count benchmarking

我有4张桌子:

Table talks
table talks_fan
table talks_follow
table talks_comments

我想要实现的目标是为每一次谈话计算所有评论,粉丝和粉丝。

到目前为止我想出了这个。

所有tables都有talk_id,且只有talks表中的主键是

SELECT
  g. *, 
  COUNT( m.talk_id ) AS num_of_comments,
  COUNT( f.talk_id ) AS num_of_followers

FROM
  talks AS g

LEFT JOIN talks_comments AS m
  USING ( talk_id )

LEFT JOIN talks_follow AS f
  USING ( talk_id )

WHERE g.privacy = 'public'
GROUP BY g.talk_id
ORDER BY g.created_date DESC 
LIMIT 30;

我也尝试过使用这种方法

SELECT
  t.*,
  COUNT(b.talk_id) AS comments, 
  COUNT(bt.talk_id) AS followers 
FROM
  talks t
LEFT JOIN talks_follow bt
  ON bt.talk_id = t.talk_id
LEFT JOIN talks_comments b
  ON b.talk_id = t.talk_id
GROUP BY t.talk_id;

两者都给我相同的结果....?!

更新:创建语句

CREATE TABLE IF NOT EXISTS `talks` (
`talk_id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` mediumint(9) NOT NULL,
`title` varchar(255) NOT NULL,
`content` text NOT NULL,
`created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`privacy` enum('public','private') NOT NULL DEFAULT 'private',
PRIMARY KEY (`talk_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=7 ;

 CREATE TABLE IF NOT EXISTS `talks_comments` (
`comment_id` bigint(20) NOT NULL AUTO_INCREMENT,
`talk_id` bigint(20) NOT NULL,
`user_id` mediumint(9) NOT NULL,
`comment` text NOT NULL,
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`status` tinyint(1) NOT NULL DEFAULT '0',
 PRIMARY KEY (`comment_id`)
 ) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=8 ;

 CREATE TABLE IF NOT EXISTS `talks_fan` (
`fan_id` bigint(20) NOT NULL AUTO_INCREMENT,
`talk_id` bigint(20) NOT NULL,
`user_id` bigint(20) NOT NULL,
`created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`status` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`fan_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=4 ;

CREATE TABLE IF NOT EXISTS `talks_follow` (
`follow_id` bigint(20) NOT NULL AUTO_INCREMENT,
`talk_id` bigint(20) NOT NULL,
`user_id` mediumint(9) NOT NULL,
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE       CURRENT_TIMESTAMP,
PRIMARY KEY (`follow_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=5 ;

有效的最终查询

SELECT t.* ,  COUNT( DISTINCT b.comment_id ) AS comments, 
            COUNT( DISTINCT bt.follow_id ) AS followers, 
            COUNT( DISTINCT c.fan_id ) AS fans
FROM talks t

LEFT JOIN talks_follow bt ON bt.talk_id = t.talk_id
LEFT JOIN talks_comments b ON b.talk_id = t.talk_id
LEFT JOIN talks_fan c ON c.talk_id = t.talk_id

WHERE t.privacy = 'public'
GROUP BY t.talk_id
ORDER BY t.created_date DESC 
LIMIT 30
编辑:整个问题的最终答案......

我已经修改了Query并在PHP(Codeigniter)中创建了一些代码以解决我的问题,即@Bill Karwin的推荐

        $sql="
    SELECT t.*,
                    COUNT( DISTINCT b.comment_id ) AS comments, 
                    COUNT( DISTINCT bt.follow_id ) AS followers, 
                    COUNT( DISTINCT c.fan_id ) AS fans,
                    GROUP_CONCAT( DISTINCT c.user_id ) AS list_of_fans
    FROM talks t

    LEFT JOIN talks_follow bt ON bt.talk_id = t.talk_id
    LEFT JOIN talks_comments b ON b.talk_id = t.talk_id
    LEFT JOIN talks_fan c ON c.talk_id = t.talk_id

    WHERE t.privacy = 'public'
    GROUP BY t.talk_id
    ORDER BY t.created_date DESC 
    LIMIT 30
    ";

    $query = $this->db->query($sql);
    if($query->num_rows() > 0)
    {

        $results = array();

        foreach($query->result_array() AS $talk){
            $fan_user_id = explode(",", $talk['list_of_fans']);
            foreach($fan_user_id AS $user){
                 if($user == 1 /* this supposed to be user id or session*/){
                     $talk['list_of_fans'] = 'yes';
                 }
            }

            $follower_user_id = explode(",", $talk['list_of_follower']);
            foreach($follower_user_id AS $user){
                 if($user == 1 /* this supposed to be user id or session*/){
                     $talk['list_of_follower'] = 'yes';
                 }
            }

             $results[] = array(
                    'talk_id'           => $talk['talk_id'], 
                    'user_id'           => $talk['user_id'],
                    'title'             => $talk['title'], 
                    'created_date'      => $talk['created_date'], 
                    'comments'          => $talk['comments'], 
                    'followers'         => $talk['followers'], 
                    'fans'              => $talk['fans'], 
                    'list_of_fans'      => $talk['list_of_fans'],
                    'list_of_follower'  => $talk['list_of_follower']                        
                    );

        }
    }

我仍然相信它可以在数据库中得到优化并且只能使用结果......

我在想如果每个TALK有1000个粉丝和2000个粉丝,那么结果将需要更长的时间才能加载..如果你不喜欢10个或者10个错误听到......

编辑:为查询测试添加基准...

我使用了codeigniter profiler来了解查询完成排除所需的时间。

据说我也开始在表格中添加数据

结果如下。

在向其提交数据后测试数据库

Query Results time

table Talks
---------------
table data 50 rows.
Time: 0.0173 seconds

Table Rows: 644 rows
Time: 0.0535 seconds

Table Rows: 1250 rows
Time: 0.0856 seconds


Adding data to other tables
--------------------------
Talks = 1250 rows
talks_follow = 4115
talks_fan = 10 rows

Time: 2.656 seconds

Adding data to other tables
--------------------------
Talks = 1250 rows
talks_follow = 4115
talks_fan = 10 rows
talks_comments = 3650 rows

Time: 10.156 seconds

After replacing LEFT JOIN with STRAIGHT_JOIN

Time: 6.675 seconds

它似乎对DB非常沉重...... 现在我正在进入另一个如何提高其绩效的困境

编辑:使用@leonardo_assumpcao建议

After rebuilding the DB using @leonardo_assumpcao suggestion
for indexing few fields..........


Adding data to other tables
--------------------------
Talks       = 6000  Rows
talks_follow    = 10000 Rows
talks_fan   = 10000 Rows
talks_comments  = 10000 Rows

Time: 17.940 second

重数据DB这是正常的吗??

3 个答案:

答案 0 :(得分:0)

您可以将此强制转换为一个查询,如下所示:

SELECT COUNT(*) num, 'talks' item         FROM talks
UNION
SELECT COUNT(*) num, 'talks_fan' item     FROM talks_fan
UNION
SELECT COUNT(*) num, 'talks_follow' item  FROM talks_follow
UNION
SELECT COUNT(*) num, 'talks_comment' item FROM talks_comment

这将为您提供五行结果集,每个表一行。每行都是特定表中的计数。

如果你必须将它全部放入一行,你可以像这样做一个支点。

SELECT 
  SUM( CASE item WHEN 'talks'         THEN num ELSE 0 END ) AS 'talks', 
  SUM( CASE item WHEN 'talks_fan'     THEN num ELSE 0 END ) AS 'talks_fan', 
  SUM( CASE item WHEN 'talks_follow'  THEN num ELSE 0 END ) AS 'talks_follow', 
  SUM( CASE item WHEN 'talks_comment' THEN num ELSE 0 END ) AS 'talks_comment'
FROM 
(   SELECT COUNT(*) num, 'talks' item         FROM talks
    UNION
    SELECT COUNT(*) num, 'talks_fan' item     FROM talks_fan
    UNION
    SELECT COUNT(*) num, 'talks_follow' item  FROM talks_follow
    UNION
    SELECT COUNT(*) num, 'talks_comment' item FROM talks_comment
) counts

(这不会考虑您的WHERE g.privacy =子句,因为我不明白。但您可以在{{1}中的四个查询中的一个查询中添加WHERE子句处理它的项目。)

请注意,对于强制执行单个查询的四个单独的表,这确实是四个查询。

顺便说一下,当UNION是表的主键时,COUNT(*)COUNT(id)之间的值没有差异。 id不计算COUNT(id)id的行,但如果NULL是主键,那么它是id。但是NOT NULL更快,所以请使用它。

编辑如果您需要每个不同谈话的粉丝数量,关注和评论行,请执行此操作。这与做联合和枢轴的想法相同,但有一个额外的参数。

COUNT(*)

在(太多年)这样做之后,我发现描述你需要的查询的最好方法是对自己说“我需要一个结果集,每个xxx有一行,yyy的列, zzz和qqq。“

答案 1 :(得分:0)

计数相同的原因是它在连接组合表之后计算行数。通过加入多个表,您将创建一个Cartesian product

基本上,你不仅要计算每次谈话有多少评论,还要计算每次演讲有多少评论*粉丝。然后你会根据每个谈话有多少粉丝*评论来关注粉丝。因此,计数是相同的,而且它们都太高了。

这是一种更简单的方法来编写查询来统计每个不同的评论,关注者等只有一次:

SELECT t.*, 
  COUNT(DISTINCT b.comment_id) AS comments, 
  COUNT(DISTINCT bt.follow_id) AS followers 
FROM talks t
LEFT JOIN talks_follow bt ON bt.talk_id = t.talk_id
LEFT JOIN talks_comments b ON b.talk_id = t.talk_id
GROUP BY t.talk_id;

重新评论:我不会在同一个查询中获取所有关注者。你可以这样做:

SELECT t.*, 
  COUNT(DISTINCT b.comment_id) AS comments, 
  COUNT(DISTINCT bt.follow_id) AS followers, 
  GROUP_CONCAT(DISTINCT bt.follower_name) AS list_of_followers
FROM talks t
LEFT JOIN talks_follow bt ON bt.talk_id = t.talk_id
LEFT JOIN talks_comments b ON b.talk_id = t.talk_id
GROUP BY t.talk_id;

但是你得到的是一个单独的字符串,跟随者的名字用逗号分隔。现在你必须编写应用程序代码以在逗号上拆分字符串,你必须担心一些关注者名称是否实际上包含逗号,依此类推。

我会做第二个查询,为特定的谈话取得粉丝。无论如何,您可能只想为特定的谈话显示关注者。

SELECT follower_name
FROM talks_follow
WHERE talk_id = ?

答案 2 :(得分:0)

我可以说这是(至少)我今天改进的最酷的选择陈述之一。

SELECT STRAIGHT_JOIN
  t.* ,
  COUNT( DISTINCT b.comment_id ) AS comments, 
  COUNT( DISTINCT bt.follow_id ) AS followers, 
  COUNT( DISTINCT c.fan_id )     AS fans

FROM
  (
    SELECT * FROM talks
    WHERE privacy = 'public'
    ORDER BY created_date DESC
    LIMIT 0, 30
  ) AS t

LEFT JOIN talks_follow   bt ON (bt.talk_id = t.talk_id)

LEFT JOIN talks_comments b  ON (b.talk_id = t.talk_id)

LEFT JOIN talks_fan      c  ON (c.talk_id = t.talk_id)

GROUP BY t.talk_id ;

但在我看来,你的问题存在于你的桌子上;获得有效查询的第一步是索引所需连接所涉及的每个字段。

我对你上面的表格进行了一些修改;您可以看到其代码here (已更新) 很有意思,不是吗?既然我们在这里,也请考虑你的ERR模型:

Tables

首先使用MySQL测试数据库进行尝试。希望它能解决您的性能问题。

(原谅我的英文,这是我的第二语言)