尝试针对下表优化查询
5 CREATE TABLE t (
6 uuid4 UUID PRIMARY KEY
7 , arr TEXT[]
10 , geom GEOMETRY
11 , ts TIMESTAMP WITHOUT TIME ZONE
12 );
13 CREATE INDEX ON t USING GIST (geom);
看起来像
explain analyze
SELECT kmeans
, count(*)::int
, ST_X(ST_Centroid(ST_Collect(geom))) AS lon
, ST_Y(ST_Centroid(ST_Collect(geom))) AS lat
, STRING_TO_ARRAY(STRING_AGG(ARRAY_TO_STRING(arr, ','), ','), ',') AS arr
FROM (
SELECT kmeans(ARRAY[ST_X(geom), ST_Y(geom)], 25) OVER (), geom, arr
FROM t
WHERE ts > NOW() - '12 hours'::interval
AND geom IS NOT NULL
AND uuid4 != '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid
AND arr @> (SELECT arr FROM t WHERE uuid4 = '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid LIMIT 1)
AND ST_Distance_Sphere(ST_MakePoint(-77, 38), geom) < 10000
) AS ksub
GROUP BY kmeans
ORDER BY kmeans;
基本上找到一定距离内的所有行,在时间范围内填充geom,并使arr包含指定arr中的所有项目。使用kmeans-postgresql聚合函数对这些找到的行进行聚类。我现在正在看
GroupAggregate (cost=347.69..349.59 rows=38 width=98) (actual time=50.034..50.384 rows=25 loops=1)
-> Sort (cost=347.69..347.78 rows=38 width=98) (actual time=49.994..49.999 rows=99 loops=1)
Sort Key: (kmeans(ARRAY[st_x(t.geom), st_y(t.geom)], 25) OVER (?))
Sort Method: quicksort Memory: 42kB
-> WindowAgg (cost=25.18..346.31 rows=38 width=94) (actual time=49.955..49.968 rows=99 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.29..8.30 rows=1 width=62) (actual time=0.018..0.018 rows=1 loops=1)
-> Index Scan using t_uuid4_ts_idx on t t_1 (cost=0.29..8.30 rows=1 width=62) (actual time=0.017..0.017 rows=1 loops=1)
Index Cond: (uuid4 = '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid)
-> Bitmap Heap Scan on t (cost=16.88..337.34 rows=38 width=94) (actual time=13.363..49.747 rows=99 loops=1)
Recheck Cond: (arr @> $0)
Filter: ((geom IS NOT NULL) AND (uuid4 <> '9ab0f8cd-9707-41da-8e30-6d29a0f22242'::uuid) AND (ts > (now() - '12:00:00'::interval)) AND (_st_distance('010100
0020E610000000000000004053C00000000000004340'::geography, geography(geom), 0::double precision, false) < 10000::double precision))
Rows Removed by Filter: 22989
-> Bitmap Index Scan on t_arr_idx (cost=0.00..16.87 rows=115 width=0) (actual time=13.072..13.072 rows=23089 loops=1)
Index Cond: (arr @> $0)
Total runtime: 50.464 ms
似乎Bitmap堆+位图索引是最佳的索引解决方案,但我一直想知道是否有办法避免额外的过滤和重新检查。有关替代索引的任何想法,我可以构建以提高性能吗?我已经尝试过了:
Indexes:
"t_pkey" PRIMARY KEY, btree (uuid4)
"t_geom_idx" gist (geom)
"t_geom_ts_idx" gist (geom, ts)
"t_geom_ts_uuid4_idx" gist (geom, ts, (uuid4::text))
"t_iam_idx" gin (arr)
"t_ts_geom_idx" gist (ts, geom)
"t_ts_geom_uuid4_idx" gist (ts, geom, (uuid4::text))
"t_ts_uuid4_geom_idx" gist (ts, (uuid4::text), geom)
"t_uuid4_ts_idx" btree (uuid4, ts)
请注意,kmeans是https://github.com/umitanuki/kmeans-postgresql的扩展名。
答案 0 :(得分:1)
根据JohnBarça的建议,我在我的几何和时间戳上使用了ST_DWithin GIST索引,并将上面发布的同一查询的运行时间减少到不到10毫秒。唯一棘手的部分意识到我需要度数而不是米来进行几何计算(地理位置可以使用米)。 This问题向我指出了一个足够准确的解决方案:
AND ST_DWithin(ST_MakePoint(-77.0710820577842, 37.9940763922052), geom, 10000 / (111.31 * 1000 * COS(ST_Y(ST_MakePoint(-77.0710820577842, 37.9940763922052)) * Pi() / 180))