如何扁平化或正确地按多个联接分组

时间:2018-12-09 14:04:15

标签: mysql

我的结构

我有三个(假设的)表;用户,电影,会话。

> SELECT * FROM users
+----+---------------------------+
| id | email                     |
+----+---------------------------+
|  5 | abcdefghijklmno@gmail.com |
+----+---------------------------+

> SELECT * FROM movies
+----+---------+---------+--------------+
| id |  title  | user_id | total_watches|
+----+---------+---------+--------------+
|  1 |  X-men  |    1    |     1        |
|  2 |  Blade  |    1    |     1        |
|  3 | Goonies |    1    |     1        |
+----+---------+---------+--------------+

> SELECT * FROM sessions
+----+---------+---------+------------+ 
| id | user_id | show_id | total_time |
+----+---------+---------+------------+
|  1 |       1 |       1 |          5 |
|  2 |       1 |       1 |         30 |
|  3 |       1 |       1 |          5 |
+----+---------+---------+------------+

我想要的

我想在一个查询中概述用户的电影活动,因此想以以下格式检索数据:

+----+---------------------------+---------------+----------------+
| id | email                     |  total_time   |  total_watches |
+----+---------------------------+---------------+----------------+
|  5 | abcdefghijklmno@gmail.com |       40      |        3       |
+----+---------------------------+---------------+----------------+

我尝试过的事情

SELECT users.id, users.email, SUM(movies.total_watches) AS total_watches, SUM(sessions.total_time) AS total_time
FROM users
JOIN movies ON users.id = movies.user_id 
JOIN sessions ON users.id = sessions.user_id 
GROUP BY users.id

这将返回(减去两列):

+------------------+---------------+---------------+
| email            | total_watches |   total_time  |
+------------------+---------------+---------------+
| abcdef@gmail.com |       9       |        120    | 
+------------------+---------------+---------------+

摘要

我了解到,额外的会话连接会为每部电影创建三行,因此将SUM结果提高了三倍,那么如何获得“扁平化”数据?我没有运气就尝试了其他组合。

1 个答案:

答案 0 :(得分:0)

如上面评论中所建议,您当前的表结构需要进一步规范化。

现在,对于这种表结构,一种骇人听闻的方法是将SUM除以另一个表的行数(由于JOIN导致重复),以抵消重复的影响。

因此,total_watches的SUM可以除以sessions表中用户ID的行数。同样,total_time的SUM可以除以movies表中的行数。

SELECT users.id, 
       SUM(movies.total_watches)/COUNT(DISTINCT sessions.id) AS total_watches, 
       SUM(sessions.total_time)/COUNT(DISTINCT movies.id) AS total_time
FROM users
JOIN movies ON users.id = movies.user_id 
JOIN sessions ON users.id = sessions.user_id 
GROUP BY users.id

结果

| id  | total_watches | total_time |
| --- | ------------- | ---------- |
| 1   | 3             | 40         |

View on DB Fiddle