合并PostgreSQL

时间:2016-09-14 23:49:58

标签: postgresql

我正在使用PostgreSQL。我有两个表,假设为了这个问题me有多个ID。第一个表Table1处理发送的消息:

me | friends | messages_sent
----------------------------
0      1            10
0      2             7 
0      3             7          
0      4             6
1      1             5
1      2            12
...

第二个Table2处理收到的消息:

me | friends | messages_received
----------------------------
 0      4            17
 0      2             7 
 0      1             9          
 0      3             0
 ...

我怎样才能得到一张桌子(不过,朋友的顺序并不重要):

    me | friends | messages_total
    ----------------------------
    0      1            19
    0      2            14 
    0      3             7          
    0      4            23
    ...

我非常难过的部分是加入me上的两个表,然后添加一个朋友的值,给出me ...想法的相等值?

2 个答案:

答案 0 :(得分:1)

您可以简单地生成两个表的并集,然后使用GROUP BYmefriends的组合进行分组,并使用聚合函数添加消息计数:

SELECT me, friends, sum(count) AS messages_total
FROM (
    SELECT me, friends, messages_sent AS count FROM Table1
    UNION ALL
    SELECT me, friends, messages_received FROM Table2
) AS t
GROUP BY me, friends;

编辑:我即将编辑我的答案,以便添加一条说明Patrick的答案更好的说明,但我决定运行一个简单的基准测试。因此,如果我们有以下设置(每个表100万行):

CREATE TABLE table1 (
    me integer not null,
    friends integer not null,
    messages_sent integer not null
);
CREATE TABLE table2 (
    me integer not null,
    friends integer not null,
    messages_received integer not null
);
INSERT INTO table1 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
INSERT INTO table2 SELECT n1, n2, floor(random()*10)::integer FROM generate_series(1, 1000) t1(n1), generate_series(1, 1000) t2(n2);
CREATE INDEX ON table1(me, friends);
CREATE INDEX ON table2(me, friends);
ANALYZE;

然后第一个解决方案:

$ EXPLAIN ANALYZE
      SELECT me, friends, sum(count) AS messages_total
      FROM (
          SELECT me, friends, messages_sent AS count FROM Table1
          UNION ALL
          SELECT me, friends, messages_received FROM Table2
      ) AS t
      GROUP BY me, friends;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=45812.00..46212.00 rows=40000 width=12) (actual time=1201.602..1499.285 rows=1000000 loops=1)
   Group Key: table1.me, table1.friends
   ->  Append  (cost=0.00..30812.00 rows=2000000 width=12) (actual time=0.022..299.260 rows=2000000 loops=1)
         ->  Seq Scan on table1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.020..91.357 rows=1000000 loops=1)
         ->  Seq Scan on table2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.004..77.672 rows=1000000 loops=1)
 Planning time: 0.255 ms
 Execution time: 1529.642 ms

第二个解决方案:

$ EXPLAIN ANALYZE
    SELECT me, friends,
           coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
    FROM Table1
    FULL JOIN Table2 USING (me, friends)
    ORDER BY me;
                                                                     QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=219582.13..222082.13 rows=1000000 width=24) (actual time=1501.873..1583.915 rows=1000000 loops=1)
   Sort Key: (COALESCE(table1.me, table2.me))
   Sort Method: external sort  Disk: 21512kB
   ->  Merge Full Join  (cost=0.85..99414.29 rows=1000000 width=24) (actual time=0.074..912.598 rows=1000000 loops=1)
         Merge Cond: ((table1.me = table2.me) AND (table1.friends = table2.friends))
         ->  Index Scan using table1_me_friends_idx on table1  (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.039..165.772 rows=1000000 loops=1)
         ->  Index Scan using table2_me_friends_idx on table2  (cost=0.42..38483.49 rows=1000000 width=12) (actual time=0.018..194.177 rows=1000000 loops=1)
 Planning time: 1.091 ms
 Execution time: 1615.011 ms

令人惊讶的是,FULL JOIN的解决方案表现稍差,即使它可以利用索引。我想这与完全加入有关;对于其他类型的加入,它会好得多。

答案 1 :(得分:1)

您应该使用mefriends两个字段加入这两个表格,然后只需添加收到和发送的消息。使用FULL JOIN可确保包括所有情况,例如我发送但未从朋友接收,反之亦然。

SELECT me, friends,
       coalesce(messages_sent, 0) + coalesce(messages_received, 0) AS messages_total
FROM Table1
FULL JOIN Table2 USING (me, friends)
ORDER BY me;