我有一个查询,该查询LEFT JOIN
将两个表与timestamptz
列并按
(date_trunc(
'DAY',
"table_one"."ttz" AT TIME ZONE
'America/Los_Angeles'
)
-
date_trunc(
'DAY',
"table_two"."ttz" AT TIME ZONE
'America/Los_Angeles')) as period
通过这种摸索,查询性能从1s(按其他列分组时)下降到40-60s。这是一个已知问题,是否有任何解决方法? 此行为不依赖于硬件配置(在具有优化的Postgres配置的服务器计算机上测试)。我还使用了Citus扩展名,一个表按日期范围进行了分区,但这并不相关(已测试)。
表DLL
CREATE TABLE table_one
(
user_id VARCHAR,
ttz timestamptz
);
查询
SELECT date_trunc(
'DAY',
table_one."ttz" AT TIME ZONE
'America/Los_Angeles'
) AT TIME ZONE 'America/Los_Angeles' table_one_day,
(date_trunc(
'DAY',
"table_one"."ttz" AT TIME ZONE
'America/Los_Angeles'
)
-
date_trunc(
'DAY',
"table_two"."ttz" AT TIME ZONE
'America/Los_Angeles')) period,
count(DISTINCT table_two.user_id)
FROM table_one
LEFT JOIN table_two ON table_one.user_id = table_two.user_id
GROUP BY table_one_day, period;
仅按table_one_day
分组时进行计划
GroupAggregate (cost=0.00..0.00 rows=0 width=0) (actual time=760.606..760.606 rows=1 loops=1)
Output: remote_scan.first_ev_day_trunc, count(DISTINCT remote_scan.count)
Group Key: remote_scan.first_ev_day_trunc
-> Sort (cost=0.00..0.00 rows=0 width=0) (actual time=760.585..760.585 rows=6 loops=1)
Output: remote_scan.first_ev_day_trunc, remote_scan.count
Sort Key: remote_scan.first_ev_day_trunc
Sort Method: quicksort Memory: 25kB
-> Custom Scan (Citus Real-Time) (cost=0.00..0.00 rows=0 width=0) (actual time=760.577..760.578 rows=6 loops=1)
Output: remote_scan.first_ev_day_trunc, remote_scan.count
Task Count: 32
Tasks Shown: One of 32
-> Task
Node: host=94.130.157.249 port=5432 dbname=klonemobile
-> Group (cost=89.13..89.25 rows=8 width=40) (actual time=0.339..0.343 rows=1 loops=1)
Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
Group Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
Buffers: shared hit=9
-> Sort (cost=89.13..89.15 rows=8 width=40) (actual time=0.337..0.338 rows=24 loops=1)
Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
Sort Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), table_two.user_id
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=9
-> Hash Left Join (cost=44.44..89.01 rows=8 width=40) (actual time=0.281..0.307 rows=24 loops=1)
Output: timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time"))), table_two.user_id
Hash Cond: ((table_one.user_id)::text = (table_two.user_id)::text)
Join Filter: ((table_one."time" < table_two."time") AND ((table_one."time" + '2 days'::interval day to second) >= table_two."time"))
Rows Removed by Join Filter: 1
Buffers: shared hit=3
-> Append (cost=0.00..44.34 rows=8 width=40) (actual time=0.024..0.027 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on table_one_17955_2004312" table_one (cost=0.00..22.15 rows=4 width=40) (actual time=0.024..0.024 rows=1 loops=1)
Output: table_one."time", table_one.user_id
Filter: ((table_one."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
Buffers: shared hit=1
-> Seq Scan on table_one_17956_2005560" table_one_1 (cost=0.00..22.15 rows=4 width=40) (actual time=0.002..0.002 rows=0 loops=1)
Output: table_one_1."time", table_one_1.user_id
Filter: ((table_one_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one_1."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
-> Hash (cost=44.34..44.34 rows=8 width=40) (actual time=0.044..0.044 rows=25 loops=1)
Output: table_two.user_id, table_two."time"
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=2
-> Append (cost=0.00..44.34 rows=8 width=40) (actual time=0.018..0.030 rows=25 loops=1)
Buffers: shared hit=2
-> Seq Scan on table_two_17955_2003480" table_two (cost=0.00..22.15 rows=4 width=40) (actual time=0.018..0.023 rows=24 loops=1)
Output: table_two.user_id, table_two."time"
Filter: ((table_two."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
Buffers: shared hit=1
-> Seq Scan on table_two_17956_2005304" table_two_1 (cost=0.00..22.15 rows=4 width=40) (actual time=0.004..0.004 rows=1 loops=1)
Output: table_two_1.user_id, table_two_1."time"
Filter: ((table_two_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two_1."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
Buffers: shared hit=1
Planning Time: 41.035 ms
Execution Time: 0.448 ms
Planning Time: 1.846 ms
Execution Time: 760.663 ms
按table_one_day
和period
分组时的计划
GroupAggregate (cost=0.00..0.00 rows=0 width=0) (actual time=46028.822..46028.825 rows=3 loops=1)
Output: remote_scan.first_ev_day_trunc, remote_scan.period, count(DISTINCT remote_scan.count)
Group Key: remote_scan.first_ev_day_trunc, remote_scan.period
Buffers: shared hit=3
-> Sort (cost=0.00..0.00 rows=0 width=0) (actual time=46028.804..46028.804 rows=7 loops=1)
Output: remote_scan.first_ev_day_trunc, remote_scan.period, remote_scan.count
Sort Key: remote_scan.first_ev_day_trunc, remote_scan.period
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3
-> Custom Scan (Citus Real-Time) (cost=0.00..0.00 rows=0 width=0) (actual time=46028.786..46028.788 rows=7 loops=1)
Output: remote_scan.first_ev_day_trunc, remote_scan.period, remote_scan.count
Task Count: 32
Tasks Shown: One of 32
-> Task
Node: host=94.130.157.249 port=5432 dbname=klonemobile
-> Group (cost=89.29..89.59 rows=8 width=48) (actual time=0.379..0.384 rows=2 loops=1)
Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
Group Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
Buffers: shared hit=12
-> Sort (cost=89.29..89.31 rows=8 width=48) (actual time=0.378..0.379 rows=24 loops=1)
Output: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
Sort Key: (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))), (date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time")))))), table_two.user_id
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=12
-> Hash Left Join (cost=44.44..89.17 rows=8 width=48) (actual time=0.284..0.337 rows=24 loops=1)
Output: timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time"))), date_part('day'::text, (timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_two."time"))) - timezone('America/Los_Angeles'::text, date_trunc('DAY'::text, timezone('America/Los_Angeles'::text, table_one."time"))))), table_two.user_id
Hash Cond: ((table_one.user_id)::text = (table_two.user_id)::text)
Join Filter: ((table_one."time" < table_two."time") AND ((table_one."time" + '2 days'::interval day to second) >= table_two."time"))
Rows Removed by Join Filter: 1
Buffers: shared hit=3
-> Append (cost=0.00..44.34 rows=8 width=40) (actual time=0.026..0.029 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on table_one_17955_2004312 table_one (cost=0.00..22.15 rows=4 width=40) (actual time=0.025..0.026 rows=1 loops=1)
Output: table_one."time", table_one.user_id
Filter: ((table_one."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
Buffers: shared hit=1
-> Seq Scan on table_one_17956_2005560 table_one_1 (cost=0.00..22.15 rows=4 width=40) (actual time=0.002..0.002 rows=0 loops=1)
Output: table_one_1."time", table_one_1.user_id
Filter: ((table_one_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_one_1."time" < '2019-03-01 11:00:00+03'::timestamp with time zone))
-> Hash (cost=44.34..44.34 rows=8 width=40) (actual time=0.026..0.026 rows=25 loops=1)
Output: table_two."time", table_two.user_id
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=2
-> Append (cost=0.00..44.34 rows=8 width=40) (actual time=0.011..0.019 rows=25 loops=1)
Buffers: shared hit=2
-> Seq Scan on "table_two_17955_2003480" table_two (cost=0.00..22.15 rows=4 width=40) (actual time=0.011..0.014 rows=24 loops=1)
Output: table_two."time", table_two.user_id
Filter: ((table_two."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
Buffers: shared hit=1
-> Seq Scan on table_two_17956_2005304 table_two_1 (cost=0.00..22.15 rows=4 width=40) (actual time=0.003..0.003 rows=1 loops=1)
Output: table_two_1."time", table_two_1.user_id
Filter: ((table_two_1."time" >= '2019-02-28 11:00:00+03'::timestamp with time zone) AND (table_two_1."time" < '2019-03-02 11:00:00+03'::timestamp with time zone))
Buffers: shared hit=1
Planning Time: 5899.378 ms
Execution Time: 0.531 ms
Planning Time: 2.757 ms
Execution Time: 46028.896 ms
答案 0 :(得分:-1)
实际上table_one中有几列?即真的只有两列吗?如果它是一个宽表,则可以在user_id,ttz上的该表上创建索引。这样就可以使数据库能够扫描较小的数据结构,即索引与较大的结构,即表。
如果仍然很慢,那么某些数据库(例如oracle)在创建索引时允许使用表达式。 Mysql允许使用与未存储的虚拟列类似的功能,即http://mysqlserverteam.com/generated-columns-in-mysql-5-7-5/并参见https://mysqlserverteam.com/virtual-columns-and-effective-functional-indexes-in-innodb/