邮政编码和城市名称匹配-在PostgreSQL中非常慢

时间:2018-08-15 05:33:48

标签: postgresql sql-update sql-like

我正在尝试使用来自othertable的数据更新mytable中的地址字段。 如果我匹配邮政编码,并从mytable中的othertable中搜索城市名称,则它的运行速度相当快。但是由于我在所有情况下都没有邮政编码,因此我也只想在第二次查询中查找姓名。这需要几个小时(> 12h)。有什么想法可以加快查询速度吗?请注意,索引编制没有帮助。 (2)中的索引扫描并不快。

邮政编码与姓名(1)匹配的代码

update mytable t1 set
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng from (
select * from othertable) t
where t.postal_code = t1.postal_code and     t1.country = t.country
and upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
and admin1 is null;

仅用于名称匹配的代码(2)

update mytable t1 set
admin1 = t.admin1,
admin2 = t.admin2,
admin3 = t.admin3,
postal_code = t.postal_code,
lat = t.lat,
lng = t.lng from (
select * from othertable) t
where t1.country = t.country
and upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
and admin1 is null;

查询计划1:

"Update on mytable t1           (cost=19084169.53..19205622.16 rows=13781     width=1918)"
"  ->  Merge Join  (cost=19084169.53..19205622.16 rows=13781 width=1918)"
"        Merge Cond: (((t1.postal_code)::text = (othertable.postal_code)::text) AND (t1.country = othertable.country))"
"        Join Filter: (upper((t1.address)::text) ~~ (('%'::text || othertable.admin1) || '%'::text))"
"        ->  Sort  (cost=18332017.34..18347693.77 rows=6270570 width=1661)"
"              Sort Key: t1.postal_code, t1.country"
"              ->  Seq Scan on mytable t1  (cost=0.00..4057214.31 rows=6270570 width=1661)"
"                    Filter: (admin1 IS NULL)"
"        ->  Materialize  (cost=752152.19..766803.71 rows=2930305 width=92)"
"              ->  Sort  (cost=752152.19..759477.95 rows=2930305 width=92)"
"                    Sort Key: othertable.postal_code, othertable.country"
"                    ->  Seq Scan on othertable  (cost=0.00..136924.05 rows=2930305 width=92)"

查询计划2:

"Update on mytable t1     (cost=19084169.53..27246633167.33 rows=5464884210 width=1918)"
"  ->  Merge Join  (cost=19084169.53..27246633167.33 rows=5464884210 width=1918)"
"        Merge Cond: (t1.country = othertable.country)"
"        Join Filter: (upper((t1.address)::text) ~~ (('%'::text || othertable.admin1) || '%'::text))"
"        ->  Sort  (cost=18332017.34..18347693.77 rows=6270570 width=1661)"
"              Sort Key: t1.country"
"              ->  Seq Scan on mytable t1  (cost=0.00..4057214.31 rows=6270570 width=1661)"
"                    Filter: (admin1 IS NULL)"
"        ->  Materialize  (cost=752152.19..766803.71 rows=2930305 width=92)"
"              ->  Sort  (cost=752152.19..759477.95 rows=2930305 width=92)"
"                    Sort Key: othertable.country"
"                    ->  Seq Scan on othertable (cost=0.00..136924.05 rows=2930305 width=92)"

1 个答案:

答案 0 :(得分:1)

在第二个查询中,您(或多或少)加入了城市名称,但是othertable每个城市名称都有多个条目,因此,您每条记录将mytable更新几次,而且不可预测值(哪个经纬度或其他admin2 / 3将是要更新的最后一个?)

如果othertable的条目中没有邮政编码,请通过添加附加条件AND othertable.posalcode is null来使用它们

否则,您将希望获得othertable的子集,该子集的每个admin1 + country值返回一行。您将用以下查询替换select * from othertable。当然,您可能需要对其进行调整以获取比第一个纬度/经度/ admin2-3更高的值。

SELECT admin1, country, first(postal_code) postal_code, first(lat) lat, first(lng) lng, first(admin2) admin2, first(admin3) admin3
FROM  othertable 
GROUP BY admin1,country

最糟糕的是,第二个查询会覆盖第一个查询中更新的内容,因此您必须通过添加and mytable.postalcode is null

来忽略这些记录。

整个查询可能是

UPDATE mytable t1 
SET
    admin1 = t.admin1,
    admin2 = t.admin2,
    admin3 = t.admin3,
    postal_code = t.postal_code,
    lat = t.lat,
    lng = t.lng 
FROM (
    SELECT admin1, country, first(postal_code) postal_code, first(lat) lat, first(lng) lng, first(admin2) admin2, first(admin3) admin3
    FROM  othertable 
    GROUP BY admin1,country) t
WHERE t1.country = t.country
AND upper(t1.address) like '%' || t.admin1 || '%' --looks whether city name from othertable shows up in address in t1
AND admin1 is null
AND mytable.postal_code is null;