我的上下文是PostgreSQL 8.3
我需要加快这个查询,因为两个表都有数百万条记录。
对于表Call中的每一行,Trunks表中有两行。对于每个call_id,当trunk_id是两行中最低的trunk_id时,我想将值从trunks.trunk复制到calls.orig_trunk。 ...当trunk_id是两行中最高的trunk_id时,将值从trunks.trunk复制到calls.orig_trunk。
表格通话的初始内容:
Call_ID | dialed_number | orig_trunk | dest_trunk
--------|---------------|------------|-----------
1 | 5145551212 | null | null
2 | 8883331212 | null | null
3 | 4164541212 | null | null
表格树干:
Call_ID | trunk_id | trunk
--------|----------|-------
1 | 1 | 116
1 | 2 | 9
2 | 3 | 168
2 | 4 | 3
3 | 5 | 124
3 | 6 | 9
表格通话的最终内容:
Call_ID | dialed_number | orig_trunk| dest_trunk
--------|---------------|-----------|----------
1 | 5145551212 | 116 | 9
2 | 8883331212 | 168 | 3
3 | 4164541212 | 124 | 9
我为每一列创建了索引。
update calls set orig_trunk = t2.trunk
from ( select call_id,trunk_id from trunks
order by trunk_id ASC ) as t2
where (calls.call_id=t2.call_id );
update calls set dest_trunk = t2.trunk
from ( select call_id,trunk_id from trunks
order by trunk_id DESC ) as t2
where (calls.call_id=t2.call_id );
有什么想法吗?
答案 0 :(得分:0)
从发布的示例中,看起来正在执行许多不必要的更新。以下是获取您要查找的结果的查询示例:
select distinct c.call_id, c.dialed_number
,first_value(t.trunk) over w as orig_trunk
,last_value(t.trunk) over w as dest_trunk
from calls c
join trunks t on (t.call_id = c.call_id)
window w as (partition by c.call_id
order by trunk_id
range between unbounded preceding
and unbounded following
)
如果没有分析功能,还有其他方法可以做到这一点,例如:
select x.call_id
,x.dialed_number
,t1.trunk as orig_trunk
,t2.trunk as dest_trunk
from (select c.call_id, c.dialed_number
,min(t.trunk_id) as orig_trunk_id
,max(t.trunk_id) as dest_trunk_id
from calls c
join trunks t on (t.call_id = c.call_id)
group by c.call_id, c.dialed_number
) x
join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
试验看看哪种情况最适合您的情况。可能希望在加入列上编入索引。
如何处理结果集取决于应用程序的性质。这是一次性的吗?那么为什么不从结果集中创建一个新表:
CREATE TABLE trunk_summary AS
SELECT ...
它在不断变化吗?经常访问吗?仅仅创建一个视图就足够了吗?或者可能基于结果集执行更新。也许一次可以更新范围。这实际上取决于,但这可能会有一个开始。
答案 1 :(得分:0)
这是带有测试条件的最终代码作为注释。 子查询非常高效和快速。但是测试表明,对表的分区对执行时间的影响要大于子查询的效率。在包含100万行的表中,更新需要80秒。在包含12百万行的表中,更新需要580秒。
update calls1900 set orig_trunk = a.orig_trunk, dest_trunk = a.dest_trunk
from (select
x.call_id,
t1.trunk as orig_trunk, t2.trunk as dest_trunk
from (select calls1900.call_id
,min(t.trunk_id) as orig_trunk_id
,max(t.trunk_id) as dest_trunk_id
from calls1900
join trunks t on (t.call_id = calls1900.call_id)
-- where calls1900.call_id between 43798930 and 43798950
group by calls1900.call_id
) x
join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
) a
where (calls1900.call_id = a.call_id); -- and (calls1900.call_id between 43798930 and 43798950)<code>