我正在使用PostgreSQL。我有两个表,假设为了这个问题me
有多个ID。第一个表Table1
处理发送的消息:
me | friends | messages_sent
----------------------------
0 1 10
0 2 7
0 3 7
0 4 6
1 1 5
1 2 12
...
第二个Table2
处理收到的消息:
me | friends | messages_received
----------------------------
0 4 17
0 2 7
0 1 9
0 3 0
...
我怎样才能得到一张桌子(不过,朋友的顺序并不重要):
me | friends | messages_total
----------------------------
0 1 19
0 2 14
0 3 7
0 4 23
...
我非常难过的部分是加入me
上的两个表,然后添加一个朋友的值,给出me
...想法的相等值?
答案 0 :(得分:1)
您可以简单地生成两个表的并集,然后使用GROUP BY
对me
和friends
的组合进行分组,并使用聚合函数添加消息计数:
SELECT me, friends, sum(count) AS messages_total
FROM (
SELECT me, friends, messages_sent AS count FROM Table1
UNION ALL
SELECT me, friends, messages_received FROM Table2
) AS t
GROUP BY me, friends;
编辑:我即将编辑我的答案,以便添加一条说明Patrick的答案更好的说明,但我决定运行一个简单的基准测试。因此,如果我们有以下设置(每个表100万行):
CREATE TABLE table1 (
me integer not null,
friends integer not null,
messages_sent integer not null
);
CREATE TABLE table2 (
me integer not null,
friends integer not null,
messages_received integer not null
);
INSERT INTO table1 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
INSERT INTO table2 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
CREATE INDEX ON table1(me, friends);
CREATE INDEX ON table2(me, friends);
ANALYZE;
然后第一个解决方案:
$ EXPLAIN ANALYZE
SELECT me, friends, sum(count) AS messages_total
FROM (
SELECT me, friends, messages_sent AS count FROM Table1
UNION ALL
SELECT me, friends, messages_received FROM Table2
) AS t
GROUP BY me, friends;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=45812.00..46212.00 rows=40000 width=12) (actual time=1201.602..1499.285 rows=1000000 loops=1)
Group Key: table1.me, table1.friends
-> Append (cost=0.00..30812.00 rows=2000000 width=12) (actual time=0.022..299.260 rows=2000000 loops=1)
-> Seq Scan on table1 (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.020..91.357 rows=1000000 loops=1)
-> Seq Scan on table2 (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.004..77.672 rows=1000000 loops=1)
Planning time: 0.255 ms
Execution time: 1529.642 ms
第二个解决方案:
$ EXPLAIN ANALYZE
SELECT me, friends,
coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
FROM Table1
FULL JOIN Table2 USING (me, friends)
ORDER BY me;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=219582.13..222082.13 rows=1000000 width=24) (actual time=1501.873..1583.915 rows=1000000 loops=1)
Sort Key: (COALESCE(table1.me, table2.me))
Sort Method: external sort Disk: 21512kB
-> Merge Full Join (cost=0.85..99414.29 rows=1000000 width=24) (actual time=0.074..912.598 rows=1000000 loops=1)
Merge Cond: ((table1.me = table2.me) AND (table1.friends = table2.friends))
-> Index Scan using table1_me_friends_idx on table1 (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.039..165.772 rows=1000000 loops=1)
-> Index Scan using table2_me_friends_idx on table2 (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.018..194.177 rows=1000000 loops=1)
Planning time: 1.091 ms
Execution time: 1615.011 ms
令人惊讶的是,FULL JOIN
的解决方案表现稍差,即使它可以利用索引。我想这与完全加入有关;对于其他类型的加入,它会好得多。
答案 1 :(得分:1)
您应该使用me
和friends
两个字段加入这两个表格,然后只需添加收到和发送的消息。使用FULL JOIN
可确保包括所有情况,例如我发送但未从朋友接收,反之亦然。
SELECT me, friends,
coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
FROM Table1
FULL JOIN Table2 USING (me, friends)
ORDER BY me;