我正在使用Postgres 10和PostGIS的GeoDjango。我有两个模型如下:
class Postcode(models.Model):
name = models.CharField(max_length=8, unique=True)
location = models.PointField(geography=True)
class Transaction(models.Model):
transaction_id = models.CharField(max_length=60)
price = models.IntegerField()
date_of_transfer = models.DateField()
postcode = models.ForeignKey(Postcode, on_delete=models.CASCADE)
property_type = models.CharField(max_length=1,blank=True)
street = models.CharField(blank=True, max_length=200)
class Meta:
indexes = [models.Index(fields=['-date_of_transfer',]),
models.Index(fields=['price',]),
]
鉴于特定的邮政编码,我想找到指定距离内最近的交易。为此,我使用以下代码:
transactions = Transaction.objects.filter(price__gte=min_price) \
.filter(postcode__location__distance_lte=(pc.location,D(mi=distance))) \
.annotate(distance=Distance('postcode__location',pc.location)).order_by('distance')[0:25]
在具有16GB RAM的Windows PC i5 2500k上,查询运行缓慢大约需要20-60秒(取决于过滤条件)。如果我通过date_of_transfer订购,那么它在<1秒内运行更长的距离(超过1英里),但对于小距离仍然很慢(例如,距离为0.1米时为45秒)。
到目前为止,我已经尝试过:
* changing the location field from Geometry to Geography
* using dwithin instead of distance_lte
这些都不会对查询的速度产生太大的影响。
GeoDjango为当前版本生成的SQL是:
SELECT "postcodes_transaction"."id",
"postcodes_transaction"."transaction_id",
"postcodes_transaction"."price",
"postcodes_transaction"."date_of_transfer",
"postcodes_transaction"."postcode_id",
"postcodes_transaction"."street",
ST_Distance("postcodes_postcode"."location",
ST_GeogFromWKB('\x0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::bytea)) AS "distance"
FROM "postcodes_transaction" INNER JOIN "postcodes_postcode"
ON ("postcodes_transaction"."postcode_id" = "postcodes_postcode"."id")
WHERE ("postcodes_transaction"."price" >= 50000
AND ST_Distance("postcodes_postcode"."location", ST_GeomFromEWKB('\x0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::bytea)) <= 1609.344
AND "postcodes_transaction"."date_of_transfer" >= '2000-01-01'::date
AND "postcodes_transaction"."date_of_transfer" <= '2017-10-01'::date)
ORDER BY "distance" ASC LIMIT 25
在邮政编码表上,位置字段有一个索引,如下所示:
CREATE INDEX postcodes_postcode_location_id
ON public.postcodes_postcode
USING gist
(location);
交易表有2200万行,邮政编码表有250万行。有关我可以采取哪些方法来改善此查询性能的任何建议?
以下是参考的查询计划:
"Limit (cost=2394838.01..2394840.93 rows=25 width=76) (actual time=19028.400..19028.409 rows=25 loops=1)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.location, '0101 (...)"
" -> Gather Merge (cost=2394838.01..2893397.65 rows=4273070 width=76) (actual time=19028.399..19028.407 rows=25 loops=1)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.location, (...)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=2393837.99..2399179.33 rows=2136535 width=76) (actual time=18849.396..18849.449 rows=387 loops=3)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.loc (...)"
" Sort Key: (_st_distance(postcodes_postcode.location, '0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::geography, '0'::double precision, true))"
" Sort Method: quicksort Memory: 1013kB"
" Worker 0: actual time=18615.809..18615.948 rows=577 loops=1"
" Worker 1: actual time=18904.700..18904.721 rows=576 loops=1"
" -> Hash Join (cost=699247.34..2074281.07 rows=2136535 width=76) (actual time=10705.617..18841.448 rows=5573 loops=3)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, _st_distance(postcodes_postcod (...)"
" Inner Unique: true"
" Hash Cond: (postcodes_transaction.postcode_id = postcodes_postcode.id)"
" Worker 0: actual time=10742.668..18608.763 rows=5365 loops=1"
" Worker 1: actual time=10749.748..18897.838 rows=5522 loops=1"
" -> Parallel Seq Scan on public.postcodes_transaction (cost=0.00..603215.80 rows=6409601 width=68) (actual time=0.052..4214.812 rows=5491618 loops=3)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street"
" Filter: ((postcodes_transaction.price >= 50000) AND (postcodes_transaction.date_of_transfer >= '2000-01-01'::date) AND (postcodes_transaction.date_of_transfer <= '2017-10-01'::date))"
" Rows Removed by Filter: 2025049"
" Worker 0: actual time=0.016..4226.643 rows=5375779 loops=1"
" Worker 1: actual time=0.016..4188.138 rows=5439515 loops=1"
" -> Hash (cost=682252.00..682252.00 rows=836667 width=36) (actual time=10654.921..10654.921 rows=1856 loops=3)"
" Output: postcodes_postcode.location, postcodes_postcode.id"
" Buckets: 131072 Batches: 16 Memory Usage: 1032kB"
" Worker 0: actual time=10692.068..10692.068 rows=1856 loops=1"
" Worker 1: actual time=10674.101..10674.101 rows=1856 loops=1"
" -> Seq Scan on public.postcodes_postcode (cost=0.00..682252.00 rows=836667 width=36) (actual time=5058.685..10651.176 rows=1856 loops=3)"
" Output: postcodes_postcode.location, postcodes_postcode.id"
" Filter: (_st_distance(postcodes_postcode.location, '0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::geography, '0'::double precision, true) <= '1609.344'::double precision)"
" Rows Removed by Filter: 2508144"
" Worker 0: actual time=5041.442..10688.265 rows=1856 loops=1"
" Worker 1: actual time=5072.242..10670.215 rows=1856 loops=1"
"Planning time: 0.538 ms"
"Execution time: 19065.962 ms"