为什么我在这个MySQL中需要COUNT(DISTINCT ...)?

时间:2017-04-01 11:12:59

标签: mysql sql join distinct

我有一个系统,允许用户创建观看视频的时间表。以下MySQL提取了有效的时间表,还提供了有关时间表中视频数量,已观看数量以及今天应观看的数量的信息。它通过多个连接到跟踪时间表到视频关联的同一个表来完成此操作。

SELECT
    schedules.*,
    COUNT(DISTINCT sv1.vid_id) AS total_vids, #<-- the problem
    GROUP_CONCAT(DISTINCT sv1.context_node_id) AS topics,
    COUNT(sv2.vid_id) AS vids_watched,
    COUNT(sv3.vid_id) AS today
FROM schedules
JOIN schedule_vids sv1 ON schedules.id = sv1.schedule_id
LEFT JOIN schedule_vids sv2 ON schedules.id = sv2.schedule_id && sv2.watched IS NOT NULL
LEFT JOIN schedule_vids sv3 ON schedules.id = sv3.schedule_id && sv3.date = CURDATE()
WHERE user_id = ? && schedules.id = ?
GROUP BY schedules.id
ORDER BY created DESC

问题:如果我不使用COUNT (DISTINCT sv1.vid_id)(即只是COUNT(sv1.vid_id)),我会得到一个远远超过真实数字的数字。我在DB中验证了这一点。有谁看到我哪里出错?

有趣的是,如果我将连接删除到sv3(当然还有select语句的相应部分),问题就会消失。

[UPDATE]

以下是涉及的两个表的表结构:

CREATE TABLE `schedules` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `name` varchar(50) NOT NULL,
 `user_id` varchar(11) NOT NULL,
 `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `start` date NOT NULL,
 `end` date NOT NULL,
 `inc_weekends` enum('y') DEFAULT NULL,
 `type` enum('ls','c') NOT NULL DEFAULT 'ls' COMMENT 'ls = learning schedule; c = course',
 `subj_id` varchar(30) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=51 DEFAULT CHARSET=latin1

CREATE TABLE `schedule_vids` (
 `schedule_id` int(11) NOT NULL,
 `vid_id` varchar(11) NOT NULL,
 `context_node_id` varchar(11) NOT NULL,
 `date` date NOT NULL,
 `watched` date DEFAULT NULL,
 PRIMARY KEY (`schedule_id`,`vid_id`,`context_node_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

示例输出:

id              50
name            some-schedule
user_id         yd8i0i63bd8
created         2017-04-01 11:58:22
start           2017-04-01
end             2017-04-03
inc_weekends    y
type            ls
total_vids      91
topics          maths
vids_watched    0
today           91

1 个答案:

答案 0 :(得分:2)

很可能,需要distinct。问题是你的加入。改为使用条件聚合:

SELECT s.*,
       COUNT(*) AS total_vids, #<-- the problem
       GROUP_CONCAT(DISTINCT sv.context_node_id) AS topics,  -- distinct is probably still needed here
       COUNT(watched) AS vids_watched,
       SUM(sv.date = CURDATE()) AS today
FROM schedules s JOIN
     schedule_vids sv
     ON s.id = sv.schedule_id LEFT JOIN
     school_users su
     ON s.user_id = su.uid  -- I'm guessing `user_id` comes from s
WHERE s.user_id = ? AND s.id = ?
GROUP BY s.id
ORDER BY s.created DESC;

如果在没有聚合的情况下运行查询,您将看到发生了什么。你正在获得视频的笛卡尔积,这就是计算结束的原因。