使用日期时间类型的对象的SQLAlchemy查询速度很慢

时间:2019-06-20 18:58:46

标签: python postgresql sqlalchemy

我有一个带有SQLAlchemy的Flask API,与PostgreSQL数据库连接。 其中一个请求以类似于以下方式查询表:

SELECT * FROM mytable WHERE updated_at >= %(updates_from)s AND updated_at <= %(updates_to)s

查询的伪代码来说明问题:

import datetime
from sqlalchemy import create_engine, sql

db_engine = create_engine("postgresql://user:password@localhost:5432/mydb")
db_connection = db_engine.connect()

d_updates_from = datetime.datetime(2018, 1, 1, 0, 0)
d_updates_to = datetime.datetime(2019, 6, 20, 17, 47)

sql_command = "SELECT * FROM mytable WHERE updated_at >= :updates_from AND updated_at <= :updates_to"

cur = db_connection.execute(
    sql.text(sql_command),
    updates_from=d_updates_from,
    updates_to=d_updates_to
)

在数据库中,updates_fromupdates_to的类型均为TIMESTAMP WITH TIME ZONE

通过直接从psql调用查询,我不到一秒钟就获得了响应。但是,通过SQLAlchemy查询时,大约需要20 s。

通过将Python datetime对象转换为ISO 8601字符串,SQLAlchemy查询时间减少到一秒钟以下。这是通过在将它们传递给.isoformat()之前在它们上调用db_connection.execute()来完成的。伪代码来说明解决方法:

cur = db_connection.execute(
    sql.text(sql_command),
    updates_from=d_updates_from.isoformat(),
    updates_to=d_updates_to.isoformat()
)

几个问题:

  1. 为SQLAlchemy execute()提供datetime而不是ISO 8601字符串时,为什么查询速度相差近100倍?
  2. 此行为记录在某处吗?

我正在使用

  • sqlalchemy = 1.3.3 = py36h7b6447c_0
  • psycopg2 = 2.7.6.1 = py36h1ba5d50_0
  • PostgreSQL 11.1

编辑:如评论中的建议,我试图提供此问题的最低限度的工作示例。虽然速度有所不同,但这只是几次。而且,由于列的集合完全不同,因此最小工作示例可能完全展示了一个不同的问题。

Bash脚本以创建具有两个表和一个视图的testdb

#!/bin/bash

sudo -u postgres psql -c "DROP DATABASE testdb;"
sudo -u postgres psql -c "CREATE DATABASE testdb;"

geometry="0106000020BE0B000001000000010300000001000000040000001806811505E71B412085EBE1B57358419A1C5A64A9E51B418A6CE7EBC4735841D64F8D17D2E71B41656666AEE67358411806811505E71B412085EBE1B5735841"

num=1000

sudo -u postgres psql -d testdb <<EOF
CREATE EXTENSION postgis; 

CREATE TABLE foreign1(id SERIAL PRIMARY KEY, geom GEOMETRY(MultiPolygon,3006));
INSERT INTO foreign1 (geom) SELECT '${geometry}' FROM GENERATE_SERIES(1, ${num}) s(i);

CREATE TABLE data(id SERIAL PRIMARY KEY, f_id INTEGER REFERENCES foreign1(id), updated_at TIMESTAMP WITH TIME ZONE);
INSERT INTO data (f_id, updated_at) SELECT i, DATE '2018-01-01' + i FROM GENERATE_SERIES(1, ${num}) s(i);

CREATE VIEW v1 AS SELECT d.f_id, d.updated_at, f.geom FROM data d LEFT JOIN foreign1 f ON d.f_id = f.id;
GRANT SELECT ON v1 TO public;
EOF

Python脚本执行查询并测量时间:

import time
import datetime
from sqlalchemy import create_engine, sql


def query(db_connection, sql_command, d_updates_from, d_updates_to):
    t_start = time.time()
    # Query with substituted timestamps.
    cur = db_connection.execute(
        sql.text(sql_command),
        updates_from=d_updates_from,
        updates_to=d_updates_to
    )
    t_end = time.time()

    # EXPLAIN the query
    cur = db_connection.execute(
        sql.text("EXPLAIN ANALYZE " + sql_command),
        updates_from=d_updates_from,
        updates_to=d_updates_to
    )
    explain_str = ""
    for row in cur.fetchall():
        explain_str += row[0] + "\n"

    # Return resulting dataframe, query execution time, query plan.
    return (t_end - t_start), explain_str

# Connect to the database
db_engine = create_engine("postgresql://localhost:5432/testdb")
db_connection = db_engine.connect()
# Timestamp range to query.
d_updates_from = datetime.datetime(2018, 1, 1, 0, 0)
d_updates_to = datetime.datetime(2020, 1, 1, 0, 0)
# Template of the query.
sql_command = "SELECT * FROM v1 WHERE updated_at >= :updates_from AND updated_at <= :updates_to;"

# Query with datetime objects.
t_variant1, explain_str1 = query(db_connection, sql_command, d_updates_from, d_updates_to)

# Query with ISO 8601 strings instead of datetime objects.
t_variant2, explain_str2 = query(db_connection, sql_command, d_updates_from.isoformat(), d_updates_to.isoformat())

print("Took {:.6f} s with datetime, {:.6f} s with ISO 8601 string. ({:.2f}x difference)".format(t_variant1, t_variant2, t_variant1 / t_variant2))
print("\nExplain with datetime:\n{}\n\nExplain with ISO 8601 string:\n{}".format(explain_str1, explain_str2))

示例输出:

Took 0.012409 s with datetime, 0.002292 s with ISO 8601 string. (5.41x difference)

Explain with datetime:
Hash Left Join  (cost=41.50..64.43 rows=730 width=124) (actual time=0.287..0.682 rows=730 loops=1)
  Hash Cond: (d.f_id = f.id)
  ->  Seq Scan on data d  (cost=0.00..21.00 rows=730 width=12) (actual time=0.010..0.272 rows=730 loops=1)
        Filter: ((updated_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (updated_at <= '2020-01-01 00:00:00'::timestamp without time zone))
        Rows Removed by Filter: 270
  ->  Hash  (cost=29.00..29.00 rows=1000 width=116) (actual time=0.265..0.265 rows=1000 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 153kB
        ->  Seq Scan on foreign1 f  (cost=0.00..29.00 rows=1000 width=116) (actual time=0.004..0.131 rows=1000 loops=1)
Planning Time: 0.285 ms
Execution Time: 0.729 ms


Explain with ISO 8601 string:
Hash Left Join  (cost=41.50..64.43 rows=730 width=124) (actual time=0.264..0.512 rows=730 loops=1)
  Hash Cond: (d.f_id = f.id)
  ->  Seq Scan on data d  (cost=0.00..21.00 rows=730 width=12) (actual time=0.007..0.123 rows=730 loops=1)
        Filter: ((updated_at >= '2018-01-01 00:00:00+02'::timestamp with time zone) AND (updated_at <= '2020-01-01 00:00:00+02'::timestamp with time zone))
        Rows Removed by Filter: 270
  ->  Hash  (cost=29.00..29.00 rows=1000 width=116) (actual time=0.253..0.253 rows=1000 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 153kB
        ->  Seq Scan on foreign1 f  (cost=0.00..29.00 rows=1000 width=116) (actual time=0.003..0.123 rows=1000 loops=1)
Planning Time: 0.115 ms
Execution Time: 0.548 ms

0 个答案:

没有答案