如何将数据从一个表插入/更新到另一个表(postgresql)?

时间:2013-03-20 07:23:49

标签: performance postgresql upsert

有两个表

tmp_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)

main_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)

我需要 当没有这样的行时,将行从tmp_stat插入main_stat(date,site_id等) 并在它们已经存在时更新计数 尽快

tmp_stat包含大约500000行,main_stat包含millons

3 个答案:

答案 0 :(得分:6)

以下是否有效?

WITH upd AS (
    UPDATE main_stat t
       SET counter = s.counter
      FROM tmp_stat s
     WHERE t.date = s.date
            AND t.site_id = s.site_id
            AND t.ip = s.ip
            AND t.block_id = s.block_id
 RETURNING s.date, s.site_id, s.ip, s.block_id, s.counter
)
INSERT INTO main_stat
     SELECT s.mydate, s.site_id, s.ip, s.block_id, s.counter
       FROM tmp_stat s 
       LEFT JOIN upd ON (upd.date = s.date and  upd.site_id = s.site_id and upd.ip = s.ip and upd.block_id = s.block_id)
      WHERE upd.date IS NULL
;

<强>更新

看起来这只适用于9.1或更高版本。

仅使用某人对WHERE (t.date, t.site_id, t.ip, t.block_id) = (s.date, s.site_id, s.ip, s.block_id)的建议似乎可以提供更好的效果。

WITH upd AS (
    UPDATE main_stat t
       SET counter = s.counter
      FROM tmp_stat s
     WHERE ( t.date, t.site_id, t.ip, t.block_id ) = ( s.date, s.site_id, s.ip, s.block_id )
 RETURNING s.date, s.site_id, s.ip, s.block_id
)
INSERT INTO main_stat
     SELECT s.date, s.site_id, s.ip, s.block_id, s.counter
       FROM tmp_stat s 
       LEFT JOIN upd 
            ON ( upd.date = s.date 
                AND upd.site_id = s.site_id 
                AND upd.ip = s.ip 
                AND upd.block_id = s.block_id )
      WHERE upd.date IS NULL
;

这里发生的是我们使用CTE进行UPDATE,CTE返回更新行的识别列。

INSERT然后使用更新的行信息过滤tmp_stat以仅插入记录。

Dimitri Fontaine在这个blog条目中涵盖了一些并发性警告。

有关CTE的更多信息,请参阅Postgresql documentation

答案 1 :(得分:1)

当我理解这个问题时,我正在以gsimes的答案为基础。

with agg_temp_stat as (
    select date, site_id, ip, block_id, sum(counter)::integer counter
    from temp_stat
    group by 1, 2, 3, 4
), upd as (
    update main_stat t
    set counter = counter + s.counter
    from agg_tmp_stat s
    where
        (t.date, t.site_id, t.ip, t.block_id)
        = (s.date, s.site_id, s.ip, s.block_id)
    returning s.date, s.site_id, s.ip, s.block_id
)
insert into main_stat
select s.date, s.site_id, s.ip, s.block_id, s.counter
from
    agg_tmp_stat s 
    left join
    upd on
        upd.date = s.date 
        and upd.site_id = s.site_id 
        and upd.ip = s.ip 
        and upd.block_id = s.block_id
where upd.date is null

基本上聚合临时表并将结果计数器与现有计数器相加。

答案 2 :(得分:1)

看似简单的Exists查询...如果列被编入索引,它应该足够快。

exmple:

-- insert missing rows
INSERT INTO main_stat (date, site_id, ip, block_id)
SELECT date, site_id, ip, block_id FROM tmp_stat tmp
WHERE NOT EXISTS (SELECT 1 FROM main_stats main 
                           WHERE tmp.date    = main.date 
                           AND   tmp.site_id = main.site_id 
                           AND   tmp.ip      = main.ip
                           AND   tmp.block_id = main.block_id
                 );
-- update count for existing rows
UPDATE main_stat main 
SET count =  main.count + (SELECT count FROM tmp_stats tmp
                           WHERE tmp.date    = main.date 
                           AND   tmp.site_id = main.site_id 
                           AND   tmp.ip      = main.ip
                           AND   tmp.block_id = main.block_id
                           LIMIT 1)

WHERE EXISTS (SELECT 1 FROM main_stats main 
                           WHERE tmp.date    = main.date 
                           AND   tmp.site_id = main.site_id 
                           AND   tmp.ip      = main.ip
                           AND   tmp.block_id = main.block_id