SQL计数多个字段

时间:2011-07-23 12:16:04

标签: mysql sql count aggregate

这是对上一个问题的跟进:Complicated COUNT query in MySQL。没有一个答案在所有条件下都有效,而且我也难以找出解决方案。我将向第一个提供完全正确答案的人提供75点奖励(我会尽快给予奖励,作为参考我之前已经完成了这个:Improving Python/django view code)。

我希望获得用户拥有的视频点数,而不允许重复(即,对于每个视频,用户可以在其中记入0或1次。我想找到三个计数:用户的视频数量上传(简单) - Uploads;用户未上传的视频中记入的视频数量 - Credited_by_others;以及用户已被记入的视频总数 - {{ 1}}。

我有三张桌子:

Total_credits

请注意,CREATE TABLE `userprofile_userprofile` ( `id` int(11) NOT NULL AUTO_INCREMENT, `full_name` varchar(100) NOT NULL, ... ) CREATE TABLE `videos_video` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` int(11) NOT NULL, `uploaded_by_id` int(11) NOT NULL, ... KEY `userprofile_video_e43a31e7` (`uploaded_by_id`), CONSTRAINT `uploaded_by_id_refs_id_492ba9396be0968c` FOREIGN KEY (`uploaded_by_id`) REFERENCES `userprofile_userprofile` (`id`) ) uploaded_by_id

相同
userprofile.id

以下是逐步说明:

1)创建2个用户:

CREATE TABLE `videos_videocredit` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `video_id` int(11) NOT NULL,
  `profile_id` int(11) DEFAULT NULL,
  `position` int(11) NOT NULL
  ...
  KEY `videos_videocredit_fa26288c` (`video_id`),
  KEY `videos_videocredit_141c6eec` (`profile_id`),
  CONSTRAINT `profile_id_refs_id_31fc4a6405dffd9f` FOREIGN KEY (`profile_id`) REFERENCES `userprofile_userprofile` (`id`),
  CONSTRAINT `video_id_refs_id_4dcff2eeed362a80` FOREIGN KEY (`video_id`) REFERENCES `videos_video` (`id`)
)

2)用户上传视频。他还没有把任何人 - 包括他自己 - 归功于其中。

insert into userprofile_userprofile (id, full_name) values (1, 'John Smith');
insert into userprofile_userprofile (id, full_name) values (2, 'Jane Doe');

结果应如下:

insert into videos_video (id, title, uploaded_by_id) values (1, 'Hamlet', 1);

3)上传视频的用户现在将自己归功于视频。请注意,这不应该改变任何内容,因为用户已经收到上传电影的功劳,我不允许重复的信用:

**User**     **Uploads**  **Credited_by_others**  **Total_credits**
John Smith       1                0                      1
Jane Doe         0                0                      0

结果现在应如下:

insert into videos_videocredit (id, video_id, profile_id, position) values (1, 1, 1, 'director')
4)用户现在在同一视频中再次称赞自己两次(即,他在视频中有多个'位置')。此外,他还将Jane Doe三次归功于该视频:

**User**     **Uploads**  **Credited_by_others**  **Total_credits**
John Smith       1                0                      1
Jane Doe         0                0                      0

结果现在应如下:

insert into videos_videocredit (id, video_id, profile_id, position) values (2, 1, 1, 'writer')
insert into videos_videocredit (id, video_id, profile_id, position) values (3, 1, 1, 'producer')
insert into videos_videocredit (id, video_id, profile_id, position) values (4, 1, 2, 'director')
insert into videos_videocredit (id, video_id, profile_id, position) values (5, 1, 2, 'editor')
insert into videos_videocredit (id, video_id, profile_id, position) values (6, 1, 2, 'decorator')

5)Jane Doe现在上传视频。她不赞成自己,但在视频中两次归功于约翰史密斯:

**User**     **Uploads**  **Credited_by_others**  **Total_credits**
John Smith       1                0                      1
Jane Doe         0                1                      1

结果现在应如下:

insert into videos_video (id, title, uploaded_by_id) values (2, 'Othello', 2)
insert into videos_videocredit (id, video_id, profile_id, position) values (7, 2, 1, 'writer')
insert into videos_videocredit (id, video_id, profile_id, position) values (8, 2, 1, 'producer')

因此,我想为每个用户找到这三个字段 - **User** **Uploads** **Credited_by_others** **Total_credits** John Smith 1 1 2 Jane Doe 1 1 2 UploadsCredited_by_others。数据永远不应为空,而是在字段没有计数时为0。谢谢。

3 个答案:

答案 0 :(得分:1)

总积分只是上传积分和外国积分的总和。由于上传信用很容易,这里只是外国信用。屏住呼吸两倍子查询。

SELECT profile_id, COUNT(video_id) AS foreign_credit
       FROM (SELECT DISTINCT profile_id, video_id FROM videos_videocredit
             WHERE (profile_id, video_id) NOT IN (SELECT uploaded_by_id, id FROM videos_video)) AS crsq
GROUP BY profile_id;

对于一个观点来说,这变得更加明显。我们制作的视图仅选择在他们未自行上传的视频中记入的人(profile_id, video_id)对。让我们调用视图vfcredits

CREATE VIEW vfcredits AS
  SELECT DISTINCT profile_id, video_id FROM videos_credit
  WHERE (profile_id, video_id) NOT IN (SELECT uploaded_by_id, id FROM videos_video);

现在我们可以愉快地将其粘贴到聚合外国信用的主查询中:

SELECT profile_id, COUNT(video_id) AS foreign_credit
FROM vfcredits
GROUP BY profile_id;

现在让我们把它们放在一起。我们再提出两个观点,一个是计算自己的学分,一个是计算外国学分:

CREATE VIEW vowncount AS
  SELECT uploaded_by_id AS profile_id, COUNT(*) AS own_credits
  FROM videos_video
  GROUP BY uploaded_by_id;

CREATE VIEW vforeigncount AS
  SELECT profile_id, COUNT(video_id) AS foreign_credits
  FROM vfcredits
  GROUP BY profile_id;

最后,完整的选择:

SELECT name,
       own_credits,
       foreign_credits,
       own_credits + foreign_credits AS total_credits
FROM userprofile_userprofile
JOIN vowncount ON(userprofile_userprofile.id = vowncount.profile_id)
JOIN vforeigncount ON(userprofile_userprofile.id = vforeigncount.profile_id);

答案 1 :(得分:1)

我使用连接重写了查询,因此服务器更容易优化。

前两个简化查询的视图

CREATE VIEW IF NOT EXISTS vperson_videos AS
    SELECT
        v.uploaded_by_id AS id,
        COUNT(*) AS uploads
    FROM vvideo v
    GROUP BY v.uploaded_by_id;

上述视图只计算用户上传的视频数量。

CREATE VIEW vperson_credits AS
    SELECT
        c.profile_id AS id,
        COUNT(DISTINCT c.video_id) AS credits
    FROM vcredit c
    INNER JOIN vvideo cv ON cv.id = c.video_id
    WHERE cv.uploaded_by_id <> c.profile_id
    GROUP BY c.profile_id;

上述视图统计了(不同的)视频的数量,这些视频记入用户,但忽略了用户自己上传的视频。

然后查询本身:

SELECT
    p.id,
    p.full_name,
    IFNULL(pv.uploads,0) AS uploads,
    IFNULL(pc.credits,0) AS credits,
    IFNULL(pv.uploads,0) + IFNULL(pc.credits,0) AS total_credits
FROM vperson p
LEFT OUTER JOIN vperson_videos pv ON pv.id = p.id
LEFT OUTER JOIN vperson_credits pc ON pc.id = p.id;

我使用LEFT OUTER JOIN来包含那些尚未上传任何视频或未在任何视频中记入任何视频的用户。 IFNULL()是必要的,因为我会NULL而不是0

最终结果是:

+----+------------+---------+---------+---------------+
| id | full_name  | uploads | credits | total_credits |
+----+------------+---------+---------+---------------+
|  1 | John Smith |       1 |       1 |             2 | 
|  2 | Jane Doe   |       1 |       1 |             2 | 
+----+------------+---------+---------+---------------+

答案 2 :(得分:1)

首先,我认为您的问题描述中存在一些错误。

  • 在第5步中,您描述Jane在视频2中两次记录John。我认为您在values子句中只是错误地排列了一些列。它应该是:

    insert into videos_videocredit (id, video_id, profile_id, position) values (7, 2, 1, 'writer');
    insert into videos_videocredit (id, video_id, profile_id, position) values (8, 2, 1, 'producer');
    
  • 您的搜索结果应该会显示John在2个视频中存入,而Jane会在1个视频中记入。

    +------------+---------+--------------------+---------------+
    | full_name  | Uploads | Credited_by_others | Total_credits |
    +------------+---------+--------------------+---------------+
    | John Smith |       1 |                  1 |             2 | 
    | Jane Doe   |       1 |                  1 |             1 | 
    +------------+---------+--------------------+---------------+
    

我在MySQL 5.1.57上测试了以下查询,它给出了上述结果。

SELECT
  u.full_name,
  COUNT(DISTINCT myvideos.id) AS Uploads,
  COUNT(DISTINCT byothers.id) AS Credited_by_others,
  COUNT(DISTINCT credited.id) AS Total_credits
FROM userprofile_userprofile AS u
LEFT OUTER JOIN videos_video AS myvideos ON myvideos.uploaded_by_id = u.id
LEFT OUTER JOIN (
  videos_videocredit AS c USE INDEX (videocredit_profileid_videoid)
  INNER JOIN videos_video AS credited
    ON c.video_id = credited.id
) ON c.profile_id = u.id
LEFT OUTER JOIN videos_video AS byothers USE INDEX (video_up_id)
  ON c.video_id = byothers.id
  AND byothers.uploaded_by_id <> u.id
GROUP BY u.id

我创建了几个额外的索引并给出了查询提示以使用它们。

CREATE INDEX video_up_id ON videos_video (id,uploaded_by_id);

CREATE INDEX videocredit_profileid_videoid ON videos_videocredit (profile_id,video_id);

这确保使用使用索引模式访问所有表(userprofile除外),这意味着它可以通过只读取索引B树来满足查询,而无需读取表数据。这是EXPLAIN报告:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: myvideos
         type: ref
possible_keys: userprofile_video_e43a31e7
          key: userprofile_video_e43a31e7
      key_len: 4
          ref: test.u.id
         rows: 1
        Extra: Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: ref
possible_keys: videocredit_profileid_videoid
          key: videocredit_profileid_videoid
      key_len: 5
          ref: test.u.id
         rows: 1
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: credited
         type: eq_ref
possible_keys: PRIMARY,video_up_id
          key: PRIMARY
      key_len: 4
          ref: test.c.video_id
         rows: 1
        Extra: Using index
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: byothers
         type: ref
possible_keys: video_up_id
          key: video_up_id
      key_len: 4
          ref: test.c.video_id
         rows: 1
        Extra: Using index
5 rows in set (0.00 sec)

优化可以在针对少量行进行测试时提供变量报告。因此,在针对真实的数据集进行测试时,我们可能会看到不同的结果,然后可能无需提供USE INDEX提示。


然而,尽管有上述解决方案,我希望在单独的查询中完成每项任务。在一个查询中执行所有操作对于开发和测试来说都很复杂,并且RDBMS执行的成本通常很高。如果你需要添加另一个计数,它将更加复杂。

SELECT
  u.full_name,
  COUNT(DISTINCT myvideos.id) AS Uploads
FROM userprofile_userprofile AS u
LEFT OUTER JOIN videos_video AS myvideos ON myvideos.uploaded_by_id = u.id
GROUP BY u.id;

SELECT
  u.full_name,
  COUNT(DISTINCT byothers.id) AS Credited_by_others
FROM userprofile_userprofile AS u
LEFT OUTER JOIN videos_videocredit AS c
  USE INDEX (videocredit_profileid_videoid)
  ON c.profile_id = u.id
LEFT OUTER JOIN videos_video AS byothers
  USE INDEX (video_up_id)
  ON c.video_id = byothers.id AND byothers.uploaded_by_id <> u.id
GROUP BY u.id;

SELECT
  u.full_name,
  COUNT(DISTINCT credited.id) AS Total_credits
FROM userprofile_userprofile AS u
LEFT OUTER JOIN (
  videos_videocredit AS c
  USE INDEX (videocredit_profileid_videoid)
  INNER JOIN videos_video AS credited
    ON c.video_id = credited.id
) ON c.profile_id = u.id
GROUP BY u.id;