我正在尝试为报告构建查询。我使用的是Postgres 9.6.1。下面我描述了我的架构,一些相同的数据,以及我试图实现的结果。
奇数表架构的道歉。我从AlertPost的连接表开始,基本上对于每个警报(alert_id
)我需要不同用户的追随者总和。由于应用程序中的其他速度原因,user_follow_count
被非规范化为Post表,这就是为什么它在此处的User表中显示的原因。
我已经尝试了大量的查询,分组,窗口和分明,但我没有得到正确的答案。
假设两个表都有点大(10mm +行)并且所有外键都被索引。
表1:帖子
- id
- user_id
- user_follow_count
表2:AlertPost
- id
- alert_id (different from id, this is a join table)
- post_id
目标:对于每个alert_id,每个不同用户的user_follower_count的总和是多少?
AlertPosts
id: 1, alert_id: 1, post_id: 1 # Same alert_id, two different post_ids
id: 2, alert_id: 1, post_id: 2
id: 3, alert_id: 2, post_id: 3
id: 4, alert_id: 2, post_id: 4
Post
id: 1, user_id: 1, user_follow_count: 3 # Same user between several posts
id: 2, user_id: 2, user_follow_count: 5
id: 3, user_id: 1, user_follow_count: 3
id: 4, user_id: 1, user_follow_count: 3
AlertPosts:
alert_id: 1, unique_followers: 8 # (sum of user_follow_count from user_id 1, 2)
alert_id: 2, unique_followers: 3 # (there are only posts from user_id 1)
答案 0 :(得分:1)
您可以通过两个步骤解决它。首先,您必须区分alert_id
,user_id
和user_follow_count
的组合,然后才对结果求和。
--Creating samples...
CREATE TABLE alert_posts (id, alert_id, post_id) AS
VALUES
(1,1,1),
(2,1,2),
(3,2,3),
(4,2,4);
CREATE TABLE post (id, user_id, user_follow_count) AS
VALUES
(1,1,3),
(2,2,5),
(3,1,3),
(4,1,3);
--First step: flattening result
WITH tmp AS (
SELECT DISTINCT
a.alert_id,
--Assuming last_value to get user_follow_count of repeated users
last_value(p.user_follow_count) OVER (
PARTITION BY
a.alert_id,
p.user_id
ORDER BY p.id DESC) AS user_follow_count
FROM
alert_posts a
JOIN post p ON p.id = a.post_id
)
--Now you can do a regular sum
SELECT alert_id, SUM(user_follow_count) AS unique_followers FROM tmp GROUP BY alert_id;
测试here。