有两个表
tmp_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)
main_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)
我需要 当没有这样的行时,将行从tmp_stat插入main_stat(date,site_id等) 并在它们已经存在时更新计数 尽快
tmp_stat包含大约500000行,main_stat包含millons
答案 0 :(得分:6)
以下是否有效?
WITH upd AS (
UPDATE main_stat t
SET counter = s.counter
FROM tmp_stat s
WHERE t.date = s.date
AND t.site_id = s.site_id
AND t.ip = s.ip
AND t.block_id = s.block_id
RETURNING s.date, s.site_id, s.ip, s.block_id, s.counter
)
INSERT INTO main_stat
SELECT s.mydate, s.site_id, s.ip, s.block_id, s.counter
FROM tmp_stat s
LEFT JOIN upd ON (upd.date = s.date and upd.site_id = s.site_id and upd.ip = s.ip and upd.block_id = s.block_id)
WHERE upd.date IS NULL
;
<强>更新强>
看起来这只适用于9.1或更高版本。
仅使用某人对WHERE (t.date, t.site_id, t.ip, t.block_id) = (s.date, s.site_id, s.ip, s.block_id)
的建议似乎可以提供更好的效果。
WITH upd AS (
UPDATE main_stat t
SET counter = s.counter
FROM tmp_stat s
WHERE ( t.date, t.site_id, t.ip, t.block_id ) = ( s.date, s.site_id, s.ip, s.block_id )
RETURNING s.date, s.site_id, s.ip, s.block_id
)
INSERT INTO main_stat
SELECT s.date, s.site_id, s.ip, s.block_id, s.counter
FROM tmp_stat s
LEFT JOIN upd
ON ( upd.date = s.date
AND upd.site_id = s.site_id
AND upd.ip = s.ip
AND upd.block_id = s.block_id )
WHERE upd.date IS NULL
;
这里发生的是我们使用CTE进行UPDATE,CTE返回更新行的识别列。
INSERT然后使用更新的行信息过滤tmp_stat以仅插入新记录。
Dimitri Fontaine在这个blog条目中涵盖了一些并发性警告。
有关CTE的更多信息,请参阅Postgresql documentation。
答案 1 :(得分:1)
当我理解这个问题时,我正在以gsimes的答案为基础。
with agg_temp_stat as (
select date, site_id, ip, block_id, sum(counter)::integer counter
from temp_stat
group by 1, 2, 3, 4
), upd as (
update main_stat t
set counter = counter + s.counter
from agg_tmp_stat s
where
(t.date, t.site_id, t.ip, t.block_id)
= (s.date, s.site_id, s.ip, s.block_id)
returning s.date, s.site_id, s.ip, s.block_id
)
insert into main_stat
select s.date, s.site_id, s.ip, s.block_id, s.counter
from
agg_tmp_stat s
left join
upd on
upd.date = s.date
and upd.site_id = s.site_id
and upd.ip = s.ip
and upd.block_id = s.block_id
where upd.date is null
基本上聚合临时表并将结果计数器与现有计数器相加。
答案 2 :(得分:1)
看似简单的Exists查询...如果列被编入索引,它应该足够快。
exmple:
-- insert missing rows
INSERT INTO main_stat (date, site_id, ip, block_id)
SELECT date, site_id, ip, block_id FROM tmp_stat tmp
WHERE NOT EXISTS (SELECT 1 FROM main_stats main
WHERE tmp.date = main.date
AND tmp.site_id = main.site_id
AND tmp.ip = main.ip
AND tmp.block_id = main.block_id
);
-- update count for existing rows
UPDATE main_stat main
SET count = main.count + (SELECT count FROM tmp_stats tmp
WHERE tmp.date = main.date
AND tmp.site_id = main.site_id
AND tmp.ip = main.ip
AND tmp.block_id = main.block_id
LIMIT 1)
WHERE EXISTS (SELECT 1 FROM main_stats main
WHERE tmp.date = main.date
AND tmp.site_id = main.site_id
AND tmp.ip = main.ip
AND tmp.block_id = main.block_id