我有一个带有SQLAlchemy的Flask API,与PostgreSQL数据库连接。 其中一个请求以类似于以下方式查询表:
SELECT * FROM mytable WHERE updated_at >= %(updates_from)s AND updated_at <= %(updates_to)s
查询的伪代码来说明问题:
import datetime
from sqlalchemy import create_engine, sql
db_engine = create_engine("postgresql://user:password@localhost:5432/mydb")
db_connection = db_engine.connect()
d_updates_from = datetime.datetime(2018, 1, 1, 0, 0)
d_updates_to = datetime.datetime(2019, 6, 20, 17, 47)
sql_command = "SELECT * FROM mytable WHERE updated_at >= :updates_from AND updated_at <= :updates_to"
cur = db_connection.execute(
sql.text(sql_command),
updates_from=d_updates_from,
updates_to=d_updates_to
)
在数据库中,updates_from
和updates_to
的类型均为TIMESTAMP WITH TIME ZONE
。
通过直接从psql
调用查询,我不到一秒钟就获得了响应。但是,通过SQLAlchemy查询时,大约需要20 s。
通过将Python datetime
对象转换为ISO 8601字符串,SQLAlchemy查询时间减少到一秒钟以下。这是通过在将它们传递给.isoformat()
之前在它们上调用db_connection.execute()
来完成的。伪代码来说明解决方法:
cur = db_connection.execute(
sql.text(sql_command),
updates_from=d_updates_from.isoformat(),
updates_to=d_updates_to.isoformat()
)
几个问题:
execute()
提供datetime
而不是ISO 8601字符串时,为什么查询速度相差近100倍?我正在使用
编辑:如评论中的建议,我试图提供此问题的最低限度的工作示例。虽然速度有所不同,但这只是几次。而且,由于列的集合完全不同,因此最小工作示例可能完全展示了一个不同的问题。
Bash脚本以创建具有两个表和一个视图的testdb
:
#!/bin/bash
sudo -u postgres psql -c "DROP DATABASE testdb;"
sudo -u postgres psql -c "CREATE DATABASE testdb;"
geometry="0106000020BE0B000001000000010300000001000000040000001806811505E71B412085EBE1B57358419A1C5A64A9E51B418A6CE7EBC4735841D64F8D17D2E71B41656666AEE67358411806811505E71B412085EBE1B5735841"
num=1000
sudo -u postgres psql -d testdb <<EOF
CREATE EXTENSION postgis;
CREATE TABLE foreign1(id SERIAL PRIMARY KEY, geom GEOMETRY(MultiPolygon,3006));
INSERT INTO foreign1 (geom) SELECT '${geometry}' FROM GENERATE_SERIES(1, ${num}) s(i);
CREATE TABLE data(id SERIAL PRIMARY KEY, f_id INTEGER REFERENCES foreign1(id), updated_at TIMESTAMP WITH TIME ZONE);
INSERT INTO data (f_id, updated_at) SELECT i, DATE '2018-01-01' + i FROM GENERATE_SERIES(1, ${num}) s(i);
CREATE VIEW v1 AS SELECT d.f_id, d.updated_at, f.geom FROM data d LEFT JOIN foreign1 f ON d.f_id = f.id;
GRANT SELECT ON v1 TO public;
EOF
Python脚本执行查询并测量时间:
import time
import datetime
from sqlalchemy import create_engine, sql
def query(db_connection, sql_command, d_updates_from, d_updates_to):
t_start = time.time()
# Query with substituted timestamps.
cur = db_connection.execute(
sql.text(sql_command),
updates_from=d_updates_from,
updates_to=d_updates_to
)
t_end = time.time()
# EXPLAIN the query
cur = db_connection.execute(
sql.text("EXPLAIN ANALYZE " + sql_command),
updates_from=d_updates_from,
updates_to=d_updates_to
)
explain_str = ""
for row in cur.fetchall():
explain_str += row[0] + "\n"
# Return resulting dataframe, query execution time, query plan.
return (t_end - t_start), explain_str
# Connect to the database
db_engine = create_engine("postgresql://localhost:5432/testdb")
db_connection = db_engine.connect()
# Timestamp range to query.
d_updates_from = datetime.datetime(2018, 1, 1, 0, 0)
d_updates_to = datetime.datetime(2020, 1, 1, 0, 0)
# Template of the query.
sql_command = "SELECT * FROM v1 WHERE updated_at >= :updates_from AND updated_at <= :updates_to;"
# Query with datetime objects.
t_variant1, explain_str1 = query(db_connection, sql_command, d_updates_from, d_updates_to)
# Query with ISO 8601 strings instead of datetime objects.
t_variant2, explain_str2 = query(db_connection, sql_command, d_updates_from.isoformat(), d_updates_to.isoformat())
print("Took {:.6f} s with datetime, {:.6f} s with ISO 8601 string. ({:.2f}x difference)".format(t_variant1, t_variant2, t_variant1 / t_variant2))
print("\nExplain with datetime:\n{}\n\nExplain with ISO 8601 string:\n{}".format(explain_str1, explain_str2))
示例输出:
Took 0.012409 s with datetime, 0.002292 s with ISO 8601 string. (5.41x difference)
Explain with datetime:
Hash Left Join (cost=41.50..64.43 rows=730 width=124) (actual time=0.287..0.682 rows=730 loops=1)
Hash Cond: (d.f_id = f.id)
-> Seq Scan on data d (cost=0.00..21.00 rows=730 width=12) (actual time=0.010..0.272 rows=730 loops=1)
Filter: ((updated_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (updated_at <= '2020-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 270
-> Hash (cost=29.00..29.00 rows=1000 width=116) (actual time=0.265..0.265 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 153kB
-> Seq Scan on foreign1 f (cost=0.00..29.00 rows=1000 width=116) (actual time=0.004..0.131 rows=1000 loops=1)
Planning Time: 0.285 ms
Execution Time: 0.729 ms
Explain with ISO 8601 string:
Hash Left Join (cost=41.50..64.43 rows=730 width=124) (actual time=0.264..0.512 rows=730 loops=1)
Hash Cond: (d.f_id = f.id)
-> Seq Scan on data d (cost=0.00..21.00 rows=730 width=12) (actual time=0.007..0.123 rows=730 loops=1)
Filter: ((updated_at >= '2018-01-01 00:00:00+02'::timestamp with time zone) AND (updated_at <= '2020-01-01 00:00:00+02'::timestamp with time zone))
Rows Removed by Filter: 270
-> Hash (cost=29.00..29.00 rows=1000 width=116) (actual time=0.253..0.253 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 153kB
-> Seq Scan on foreign1 f (cost=0.00..29.00 rows=1000 width=116) (actual time=0.003..0.123 rows=1000 loops=1)
Planning Time: 0.115 ms
Execution Time: 0.548 ms