我在PostgreSQL中处理一个庞大的数据库。 (对不起,如果没有正确编辑,我已经尝试了几个小时并且仍然在努力)
这是用于我的查询的表格结构的一部分:( table user_activities)包含一些示例数据。
+---------------------+---------------------+---------------------+
| user_id | activity | operation |
+---------------------+---------------------+---------------------+
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 4 | 1 | 4 |
| 5 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 3 | 1 |
| 6 | 3 | 1 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
| 8 | 3 | 4 |
| 8 | 3 | 5 |
+---------------------+---------------------+---------------------+
这是我想要的输出:
+---------------------+---------------------+---------------------+
| count(user_id) | activity | operation |
+---------------------+---------------------+---------------------+
| 4 | 1 | 1,2 |
| 6 | 1 | 3,4,5 |
| 6 | 3 | 1,2,3,4,5 |
+---------------------+---------------------+---------------------+
我需要为每个活动和一组操作值计算user_id。因此,当活动为1或3时,我需要按活动进行分组。(已经完成了WHERE activity IN (1,3)
)。但我也需要按操作分组。问题是每组操作都有超过1个值。操作可以是1,2,3,4和5.我想连接1,2和3,4,5的组。但那不是全部......
如果我按操作分组,那么每个活动我都会有5个小组。我需要为活动1(已指定的组)设置2个组,如果活动为3,则只需要一个具有所有操作值的组。
这可能吗?
修改 我现在无法检查答案,我希望明天能够。那么请给出我的投票和答复,谢谢你的帮助。
答案 0 :(得分:2)
根据您的详细规范进行了更新:
SELECT COUNT(*) as cnt, ua.activity, array_agg(distinct ua.operation)
FROM users ua
JOIN (
SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE
) c
ON ua.activity = c.activity and ua.operation = c.operation
GROUP BY c.GROUP_CODE, ua.activity
http://sqlfiddle.com/#!15/46e1f/15
原始回答
我就是这样做的,下面我动态创建逻辑表,但你也可以在数据库中拥有该表并加入它。
SELECT GROUP_CODE, COUNT(*) as cnt
FROM user_activities ua
JOIN (
SELECT 1 AS activity, 1 as operation, 1 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 2 as operation, 1 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 3 as operation, 2 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 4 as operation, 2 as GROUP_CODE
UNION ALL
SELECT 1 AS activity, 5 as operation, 2 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 1 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 2 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 3 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 4 as operation, 3 as GROUP_CODE
UNION ALL
SELECT 3 AS activity, 5 as operation, 3 as GROUP_CODE
) c
ON ua.activity = c.activity and ua.operation = c.operation
GROUP BY GROUP_CODE
这应该非常快 - 记住SQL设计用于集合(表)和连接 - 这使用连接来执行逻辑。这也很好,因为如果你把它变成一个表,你可以通过改变表来改变逻辑或者有多个"逻辑"如果您添加另一列以选择打开,则存储在表中,然后选择在查询运行时使用哪一列。
我已经使用类似的方法在动态用户界面中进行加权和个性化排序。
答案 1 :(得分:2)
根据我的理解,这样的查询可以帮到你。问题和评论中的信息让我感到困惑,所以我用我最好的判断来提供解决方案
create table test (user_id int, activity int, operation int);
insert into test values (1,1,1), (1,1,1), (1,1,2), (2,1,3), (2,1,4), (3,3,1), (4,3,3), (4,3,5);
select count(*), activity, array_agg(operation)
from test
group by activity, user_id
Result:
| count | activity | array_agg |
| 3 | 1 | {1,1,2} |
| 2 | 1 | {3,4} |
| 1 | 3 | {1} |
| 2 | 3 | {3,5} |
根据编辑过的问题,我觉得这就是我解决问题的方法:
表:
create table test (user_id int, activity int, operation int);
insert into test values
(1,1,1),(1,1,1),(1,1,1),
(2,1,2),(2,1,3),
(3,1,3),
(4,1,4),(4,1,4),
(5,1,4),(5,1,5),
(6,3,1),(6,3,1),(6,3,2),
(7,3,3),
(8,3,4),(8,3,5);
查询:
select count(*), activity, string_agg(distinct operation::VARCHAR, ',')
from test
where operation in (1,2) and activity = 1
group by activity
UNION ALL
select count(*), activity, string_agg(distinct operation::VARCHAR, ',')
from test
where operation in (3,4,5) and activity = 1
group by activity
UNION ALL
select count(*), activity, string_agg(distinct operation::VARCHAR, ',')
from test
where activity = 3
group by activity
结果
count | activity | string_agg
4 | 1 | 1,2
6 | 1 | 3,4,5
6 | 3 | 1,2,3,4,5
答案 2 :(得分:1)
只需使用CASE将您想要的群组放在一起。
WITH cte as (
SELECT "user_id", "activity", "operation",
CASE
WHEN "activity" = 1 THEN
CASE
WHEN "operation" IN (1,2) THEN '1_first'
ELSE '1_second'
END
WHEN "activity" = 3 THEN '3_first'
END as "op_group"
FROM user_activities
)
SELECT "activity",
"op_group",
count("user_id"),
array_agg(distinct "operation") as "operation"
FROM cte
GROUP BY "activity", "op_group"
输出
| activity | op_group | count | operation |
|----------|----------|-------|-----------|
| 1 | 1_first | 4 | 1,2 |
| 1 | 1_second | 6 | 3,4,5 |
| 3 | 3_first | 6 | 1,2,3,4,5 |