我在数字海洋服务器上有一个数据库,对我来说似乎有点慢(有时超过一秒)。 Postgis的Postgresql正在那里运行。
以下是有关数据库房屋的一些统计数据,实际上只存储了一些公寓:
房屋:190000
SELECT count(*) from houses;
过去24小时内上线的房屋:58000
SELECT count(*) FROM houses
JOIN (select max(last_seen) as last_ts from houses) as dt
ON last_seen >= dt.last_ts - interval '24 hour';
位于特定区域且有效的房屋:3086
select count(*) from houses
where ST_DWithin(geom, ST_MakePoint(52.5277411, 13.4)::geography,30000)
(active IS NULL OR active = TRUE)
这是实际的SQL查询,有点慢。慢意味着一个查询有时需要超过一秒钟:
SELECT
*,
ST_DistanceSphere(geom, ST_MakePoint(52.5277411, 13.4)) as distace
FROM houses
JOIN (select max(last_seen) as last_ts from houses) as dt
ON last_seen >= dt.last_ts - interval '24 hour'
WHERE
ST_DWithin(geom, ST_MakePoint(52.5277411, 13.4)::geography,30000)
AND (active IS NULL OR active = TRUE)
到目前为止我尝试了什么。删除连接,因为它有点多余。介绍指数。
以下是查询说明:
任何想法如何改进?非常感谢!
PS:如果缺少某些数据,请告诉我,我会提供。
答案 0 :(得分:1)
因为很多人都试图提供帮助并给出了非常好的建议我想发布我的最终解决方案: 正如评论中所提到的,您应该始终测量,优化,重复。表格大小和指数是关键点。
由于我不是这个主题的专家,因此可视化对http://tatiyants.com
的帮助很大 Explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
select
*,
ST_DistanceSphere(geom, ST_MakePoint(52.5277411, 13.4)) as distace
FROM houses
JOIN (select max(last_seen) as last_ts from houses) as dt
ON last_seen >= dt.last_ts - interval '24 hour'
WHERE
ST_DWithin(geom, ST_MakePoint(52.5277411, 13.4)::geography,30000)
AND (active IS NULL OR active = TRUE);
这有助于基本的理解。由于我已经在使用索引,因此没有那么多可能的优化。在我的情况下,可以得到一点延迟的结果。我介绍了存储查询的一部分的物化视图:
CREATE MATERIALIZED VIEW mathouses
select
*,
FROM houses
JOIN (select max(last_seen) as last_ts from houses) as dt
ON last_seen >= dt.last_ts - interval '24 hour'
WHERE (active IS NULL OR active = TRUE);
然后在该视图上添加了索引。并添加了一个简单的shell脚本,每小时由cron调用:
#!/bin/sh
sudo -u <myuser>-Hi -- psql -d <db> -c 'refresh materialized view mathouses;'
我的最终结果:
Explain (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
select
*,
ST_DistanceSphere(geom, ST_MakePoint(52.5277411, 13.4)) as distace
FROM mathouses
WHERE ST_DWithin(geom, ST_MakePoint(52.5277411, 13.4)::geography,30000);
对解决方案非常满意。它现在是3倍甚至更快的因素。为了更进一步,下一个逻辑步骤是查看硬件或优化postgresql设置。