我正在使用postgresql 9.3.9,并且有一个表users
,如下所示:
user_id | email
----------------------------
1001 | hello@world.com
1030 | mel@hotmail.com
2333 | jess@gmail.com
2502 | peter@gmail.com
3000 | olivia@hotmail.com
4000 | sharon@gmail.com
4900 | lisa@gmail.com
然后我有几个表列出了用户在各种平台上连接的内容以及连接时间。即platform_a
,platform_b
,platform_c
等
platform_a
可能如下所示:
user_id | created_at
----------------------------
1001 | 2015-04-30
2333 | 2015-05-15
3000 | 2014-02-15
platform_b
可能如下所示:
user_id | created_at
----------------------------
1001 | 2015-06-30
2333 | 2015-07-02
4900 | 2015-07-03
platform_c
可能如下所示:
user_id | created_at
----------------------------
1001 | 2015-08-16
1030 | 2015-07-03
3000 | 2015-09-01
4000 | 2015-09-01
我希望最终结果如下:
user_id | # of connections | latest created_at | connected to a | connected to b | connected to c
--------------------------------------------------------------------------------------------------
1001 | 3 | 2015-08-16 | yes | yes | yes
1030 | 1 | 2015-07-03 | no | no | yes
2333 | 2 | 2015-07-02 | yes | yes | no
2502 | 0 | | no | no | no
3000 | 2 | 2015-09-01 | yes | no | yes
4000 | 1 | 2015-09-01 | no | no | yes
4900 | 1 | 2015-07-03 | no | yes | no
我该怎么做?
答案 0 :(得分:4)
首先,与所有表格建立联盟:
SELECT user_id, created_at, 1 AS a, 0 AS b, 0 AS c FROM tableA
UNION
SELECT user_id, created_at, 0 AS a, 1 AS b, 0 AS c FROM tableB
UNION
SELECT user_id, created_at, 0 AS a, 0 AS b, 1 AS c FROM tableC
然后将此子查询的结果分组
SELECT user_id, COUNT(user_id), MAX(created_at), MAX(a), MAX(b), MAX(c)
FROM subquery_above
GROUP BY user_id
这不会给你零结果,但你可以通过用户列表上的LEFT JOIN来实现。
答案 1 :(得分:3)
select
user_id,
count(p),
max(created_at),
coalesce(sum((pl = 'a')::int), 0) connected_to_a,
coalesce(sum((pl = 'b')::int), 0) connected_to_b,
coalesce(sum((pl = 'c')::int), 0) connected_to_c
from users u
left join (
select *, 'a' pl from platform_a
union all
select *, 'b' pl from platform_b
union all
select *, 'c' pl from platform_c
) p
using (user_id)
group by 1;
user_id | count | max | connected_to_a | connected_to_b | connected_to_c
---------+-------+------------+----------------+----------------+----------------
1001 | 3 | 2015-08-16 | 1 | 1 | 1
1030 | 1 | 2015-07-03 | 0 | 0 | 1
2333 | 2 | 2015-07-02 | 1 | 1 | 0
2502 | 0 | | 0 | 0 | 0
3000 | 2 | 2015-09-01 | 1 | 0 | 1
4000 | 1 | 2015-09-01 | 0 | 0 | 1
4900 | 1 | 2015-07-03 | 0 | 1 | 0
(7 rows)
答案 2 :(得分:1)
当您检查所有用户时,在加入之前通常最快聚合:
connections
结果完全符合要求 - 除了SELECT
为NULL而不是' 0'在你的例子中。如果您需要转换,请在外部SELECT *
中使用COALESCE()
。我没有,因为SELECT
非常方便
如果您要列出外部users
中的所有列,您也可以使用u
代替子查询{{1}}来剪切其他列。
bool_or()
是这项工作的完美集合函数。
一个平台可能有多个链接。此查询仍会为每个用户返回一行。