我正在使用一个包含大约1900万行和60列(bigtable)的表。在1900万条记录中,约有1700万条记录具有x和y坐标(大约180万条x和y的不同组合)。我需要从另一个文件(census_geocode)向表中添加一些额外的地理编码信息。我创建了一个查找表(distinct_xy),其中包含所有不同的x和y坐标对以及ID的列表。我有bigtable(x_coord,y_coord),census_geocode(x_coord,y_coord)和distinct_xy(x_coord,y_coord)的索引,以及distinct_xy(xy_id)和census_geocode(xy_id)中的主键。所以这是查询:
Update bigtable
set block_grp = cg.blkgrp,
block = cg.block,
tract = cg.tractce10
from census_geocode cg, distinct_xy xy
where bigtable.x_coord = xy.x_coord and
bigtable.y_coord=xy.y_coord and cg.xy_id=xy.xy_id;
这很慢非常。如:
"Update on bigtable (cost=17675751.51..17827040.74 rows=22 width=327)"
" -> Nested Loop (cost=17675751.51..17827040.74 rows=22 width=327)"
" -> Merge Join (cost=17675751.51..17826856.26 rows=22 width=312)"
" Merge Cond: ((bigtable.x_coord = xy.x_coord) AND (bigtable.y_coord = xy.y_coord))"
" -> Sort (cost=17318145.58..17366400.81 rows=19302092 width=302)"
" Sort Key: bigtable.x_coord, bigtable.y_coord"
" -> Seq Scan on bigtable (cost=0.00..1457709.92 rows=19302092 width=302)"
" -> Materialize (cost=357588.42..366887.02 rows=1859720 width=26)"
" -> Sort (cost=357588.42..362237.72 rows=1859720 width=26)"
" Sort Key: xy.x_coord, xy.y_coord"
" -> Seq Scan on distinct_xy xy (cost=0.00..30443.20 rows=1859720 width=26)"
" -> Index Scan using census_geocode_pkey on census_geocode cg (cost=0.00..8.37 rows=1 width=23)"
" Index Cond: (xy_id = xy.xy_id)"
我也尝试将它拆开并将查找键插回到大表中以避免多表连接。
Update bigtable
set xy_id = xy.xy_id
from distinct_xy xy
where bigtable.x_coord = xy.x_coord and bigtable.y_coord=xy.y_coord;
这也会持续数小时没有完成。
"Update on bigtable (cost=0.00..20577101.71 rows=22 width=404)"
" -> Nested Loop (cost=0.00..20577101.71 rows=22 width=404)"
" -> Seq Scan on distinct_xy xy (cost=0.00..30443.20 rows=1859720 width=26)"
" -> Index Scan using rae_xy_idx on bigtable (cost=0.00..11.03 rows=1 width=394)"
" Index Cond: ((x_coord = xy.x_coord) AND (y_coord = xy.y_coord))"
有人可以帮助我改进此查询的效果吗?