如何根据他们所做的评论数量分组用户数量?

时间:2014-08-23 22:47:08

标签: sql postgresql count group-by

我想根据他们所做的评论数量来分组用户数。

[User]: ID
[Comment]: ID, UserID

因此,如果user A has made 1 comment, user B has made 1 comment and user C has made 2 comments,则输出为:

0 comments => 0 users
1 comment  => 2 users (A+B)
2 comments => 1 user  (C)

您如何查询?

2 个答案:

答案 0 :(得分:3)

这取决于您的特定数据库结构,但我们假设您有一个用户表和一个注释表:

users table:
id: serial
name: text

comments table:
id: serial
user_id: integer (foreign key to the users table)
comment: text

您可以计算每个用户使用此查询进行的评论数量:

  SELECT users.id, users.name, count(comments.id) as comment_cnt
    FROM users LEFT JOIN
         comments ON users.id = comments.user_id
GROUP BY users.id, users.name

然后,您可以在嵌套查询中使用此查询的结果来计算每个注释数的用户数:

  SELECT comment_cnt, count(*) FROM
  (SELECT users.id, users.name, count(comments.id) as comment_cnt
    FROM users LEFT JOIN
         comments ON users.id = comments.user_id
GROUP BY users.id, users.name) AS comment_cnts
GROUP BY comment_cnt;

我不知道任何优雅的方法来填补给定数量的评论没有用户的空白,但临时表和另一层嵌套工作:

CREATE TABLE wholenumbers (num integer);

INSERT INTO wholenumbers VALUES (0), (1), (2), (3), (4), (5), (6);

   SELECT num as comment_cnt, COALESCE(user_cnt,0) as user_cnt
     FROM wholenumbers
LEFT JOIN (SELECT comment_cnt, count(*) AS user_cnt
             FROM (  SELECT users.id, users.name, count(comments.id) AS comment_cnt
                       FROM users LEFT JOIN comments ON users.id = comments.user_id
                   GROUP BY users.id, users.name) AS comment_cnts
         GROUP BY comment_cnt) AS user_cnts ON wholenumbers.num = user_cnts.comment_cnt
ORDER BY num;

答案 1 :(得分:2)

以表格布局@ClaytonC provided

为基础
WITH cte AS (
   SELECT msg_ct, count(*) AS users
   FROM  (
      SELECT count(*) AS msg_ct
      FROM   comments 
      GROUP  BY user_id
      ) sub
   GROUP  BY 1
   )
SELECT msg_ct, COALESCE(users, 0) AS users
FROM   generate_series(0, (SELECT max(msg_ct) FROM cte)) msg_ct
LEFT   JOIN cte USING (msg_ct)
ORDER  BY 1;

重点

  • 首先,计算每位用户的评论(msg_ct)。只要外键强制执行引用完整性,就需要加入users表来聚合每个用户的注释。只计算comments中的行数 接下来,计算每个邮件计数的用户数(users)。

  • 我在CTE中执行此操作,因为我在最终查询中使用了两次派生表。
    首先generate_series()生成从最小到最大动态的所有计数,包括间隙 然后将表格LEFT JOIN并获得最终结果。

  • 计数从0开始(在我更新后)。如果您希望以最小的实际msg_ct开头,请在编辑历史记录中考虑我的答案的初稿。

  • 解释基础知识的密切相关答案:

对没有评论的用户进行计数

正如@ClaytonC评论的那样,上面的答案包括没有评论的用户。

要解决此问题(如果您确实需要它),请在开始之后将LEFT JOIN加到users

WITH cte AS (
   SELECT msg_ct, count(*) AS users
   FROM  (
      SELECT count(c.user_id) AS msg_ct
      FROM   users u
      LEFT   JOIN comments c ON c.user_id = u.id
      GROUP  BY u.id
      ) sub
   GROUP  BY 1
   )
SELECT ...

,因为加入仅用于查找没有评论的用户,我们可能更便宜:计算所有用户并减去用户评论(无论如何我们处理过):

WITH cte AS (
   SELECT msg_ct, count(*)::int AS users
   FROM  (
      SELECT count(*)::int AS msg_ct
      FROM   comments 
      GROUP  BY user_id
      ) sub
   GROUP  BY 1
   )
, agg AS (
   SELECT max(msg_ct)   AS max_ct      -- maximum for generate_series
         ,((SELECT count(*) FROM users) - sum(users))::int AS users
                                       -- quiet rest with 0 comments
   FROM cte
   )
SELECT 0 AS msg_ct, users FROM agg     -- users with 0 comments
UNION  ALL
SELECT msg_ct, COALESCE(users, 0)
FROM  (SELECT generate_series(1, max_ct) AS msg_ct FROM agg) g
LEFT   JOIN cte USING (msg_ct)
ORDER  BY 1;

查询变得有点复杂,但对于大表来说可能更快。不确定。使用EXPLAIN ANALYZE进行测试(我将非常感谢对结果的评论。)