我有一张这样的表:
Table "public.transactions"
Column | Type | Nullable | Default | Storage |
---------------------+--------------------------+----------+--------------------|
id | integer | not null | nextval | plain |
ticket | integer | | | plain |
pay_station | character varying(50) | | extended| |
stall | character varying(50) | | | extended |
license_plate | character varying(8) | | | extended |
purchased_date | timestamp with time zone | not null | | plain |
expiry_date | timestamp with time zone | | | plain |
payment_type | character varying(50) | | | extended |
total_collections | numeric(10,2) | | | main |
revenue | numeric(10,2) | | | main |
rate_name | character varying(50) | | | extended |
hours_paid | numeric(4,2) | | | main |
validation_revenue | numeric(10,2) | | | main |
transaction_fee | numeric(10,2) | | | main |
method | character varying(50) | | | extended |
Indexes:
"transactions_pkey" PRIMARY KEY, btree (id)
"transactions_expiry_date_idx" btree (expiry_date)
"transactions_purchased_date_idx" btree (purchased_date)
"transactions_stall_idx" btree (stall)
为简洁起见,我省略了20多列。
此表有大约250万行。
现在我在Flask中提供API的Python代码对于示例查询看起来像这样:
filters = [
datetime_range['start'] < Transactions.expiry_date,
datetime_range['end'] > Transactions.purchased_date
]
if 'parking_spaces' in params:
spaces = params['parking_spaces'] # array
filters.append(Transactions.stall.in_(spaces))
results = Transactions.query.with_entities(
Transactions.stall, Transactions.purchased_date, Transactions.expiry_date
).filter(*filters).order_by(Transactions.purchased_date).all()
日期为datetime
个对象。现在,如果我没有在POST正文中提供任何输入,我默认为最小/最大时间,没有WHERE IN空格,查询如下所示:
datetime_range:
{'start': datetime.datetime(2016, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), 'end': datetime.datetime(2019, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)}
Query:
SELECT transactions.stall AS transactions_stall, transactions.purchased_date AS transactions_purchased_date, transactions.expiry_date AS transactions_expiry_date
FROM transactions
WHERE transactions.expiry_date > %(expiry_date_1)s AND transactions.purchased_date < %(purchased_date_1)s ORDER BY transactions.purchased_date
现在,如果我直接在psql
中执行查询:
SELECT transactions.stall, transactions.purchased_date, transactions.expiry_date FROM transactions WHERE transactions.expiry_date > '2016-01-01 00:00:00.000Z' AND transactions.purchased_date < '2019-01-01 00:00:00.000Z';
时间:2724.326 ms(00:02.724)
但是,通过Flask在SQLAlchemy中执行相同的查询,使用Postman进行测试,我得到 866946 ms(14:26.946)的响应,返回13.4KB的数据。
这显然是一个巨大的差异。当我调整范围时,响应时间呈指数增长 - 一些样本:
{
"datetime_range": {
"start": "2017-01-01T14:30:00.000Z",
"end": "2017-01-04T18:00:00.000Z"
}
}
响应时间:13387毫秒(00:13.39)
psql中的相同查询:
SELECT transactions.stall, transactions.purchased_date, transactions.expiry_date FROM transactions WHERE transactions.expiry_date > '2017-01-01T14:30:00.000Z' AND transactions.purchased_date < '2017-01-04T18:00:00.000Z';
时间:580.603毫秒
{
"datetime_range": {
"start": "2017-01-01T14:30:00.000Z",
"end": "2017-03-01T18:00:00.000Z"
}
}
SQLAlchemy响应时间:41878毫秒
SELECT transactions.stall, transactions.purchased_date, transactions.expiry_date FROM transactions WHERE transactions.expiry_date > '2017-01-01T14:30:00.000Z' AND transactions.purchased_date < '2017-03-01T18:00:00.000Z';
PostgreSQL响应时间:1170.169 ms(00:01.170)
为什么这里有如此巨大的差异,我怎样才能让SQLAlchemy更快地执行并保持我的Flask响应时间大约为秒,而不是几分钟?