我使用PostgreSQL。我做一个表聚合计算。 我使用核心四块。我的数据格式是这样的。
----------------------------------------------
line_no |holiday_flag| passenger_up|plan_date
----------------------------------------------
1 0 22 2019-02-02
1 0 15 2019-02-02
1 0 14 2019-02-02
1 1 0 2019-02-03
1 1 11 2019-02-03
1 1 3 2019-02-03
1 1 15 2019-02-03
1 3 4 2019-02-17
1 3 1 2019-02-17
1 3 3 2019-02-17
1 3 4 2019-02-17
1 3 2 2019-02-17
1 3 0 2019-02-17
1 3 1 2019-02-17
1 2 0 2019-02-02
1 2 0 2019-02-02
1 2 0 2019-02-02
1 2 3 2019-02-02
我想用holiday_flag来对passenger_up求和。holiday_flag的类型为0、1、2、3。holiday_flag(0,2)聚集在一组中。最终的holiday_flag类型为0、1、3。 我的最终结果是这样的。
-----------------------------------
line_no |holiday_flag| passenger_up
-----------------------------------
1 0 4858
1 1 2176
1 3 4638
-----------------------------------
我的sql现在是这样的。
SELECT
line_no,
holiday_flag,
round(passenger_up / day_count) as passenger_up
FROM
(
SELECT
line_no,
holiday_flag,
SUM (passenger_up) AS passenger_up,
SUM (day_count) AS day_count
FROM
(
SELECT
line_no,
CASE
WHEN holiday_flag = 0
OR holiday_flag = 2 THEN
0
ELSE
holiday_flag
END AS holiday_flag,
SUM (passenger_up) AS passenger_up,
COUNT (DISTINCT plan_date) AS day_count
FROM
ext_passenger_flow_tag
WHERE
line_no = 1
AND plan_date >= '2019-01-31'
AND plan_date <= '2019-02-20'
GROUP BY
line_no,
holiday_flag
) AS B
GROUP BY
B.line_no,
B.holiday_flag
) AS C
我的EXPLAIN输出是这样的。
Subquery Scan on c (cost=4456.55..4456.72 rows=4 width=40) (actual time=19.316..19.322 rows=3 loops=1)
Output: c.line_no, c.holiday_flag, round((c.passenger_up / c.day_count), 0)
Buffers: shared hit=372
-> GroupAggregate (cost=4456.55..4456.66 rows=4 width=72) (actual time=19.312..19.316 rows=3 loops=1)
Output: b.line_no, b.holiday_flag, sum(b.passenger_up), sum(b.day_count)
Group Key: b.line_no, b.holiday_flag
Buffers: shared hit=372
-> Sort (cost=4456.55..4456.56 rows=4 width=24) (actual time=19.305..19.305 rows=4 loops=1)
Output: b.line_no, b.holiday_flag, b.passenger_up, b.day_count
Sort Key: b.line_no, b.holiday_flag
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=372
-> Subquery Scan on b (cost=4359.02..4456.51 rows=4 width=24) (actual time=17.501..19.279 rows=4 loops=1)
Output: b.line_no, b.holiday_flag, b.passenger_up, b.day_count
Buffers: shared hit=372
-> GroupAggregate (cost=4359.02..4456.47 rows=4 width=26) (actual time=17.500..19.276 rows=4 loops=1)
Output: ext_passenger_flow_tag.line_no, CASE WHEN ((ext_passenger_flow_tag.holiday_flag = 0) OR (ext_passenger_flow_tag.holiday_flag = 2)) THEN 0 ELSE (ext_passenger_flow_tag.holiday_flag)::integer END, sum(ext_passenger_flow_tag.passenger_up), count(DISTINCT ext_passenger_flow_tag.plan_date), ext_passenger_flow_tag.holiday_flag
Group Key: ext_passenger_flow_tag.line_no, ext_passenger_flow_tag.holiday_flag
Buffers: shared hit=372
-> Sort (cost=4359.02..4378.50 rows=7790 width=14) (actual time=15.231..16.075 rows=7899 loops=1)
Output: ext_passenger_flow_tag.line_no, ext_passenger_flow_tag.holiday_flag, ext_passenger_flow_tag.passenger_up, ext_passenger_flow_tag.plan_date
Sort Key: ext_passenger_flow_tag.holiday_flag
Sort Method: quicksort Memory: 563kB
Buffers: shared hit=372
-> Bitmap Heap Scan on public.ext_passenger_flow_tag (cost=2174.75..3855.50 rows=7790 width=14) (actual time=9.908..12.697 rows=7899 loops=1)
Output: ext_passenger_flow_tag.line_no, ext_passenger_flow_tag.holiday_flag, ext_passenger_flow_tag.passenger_up, ext_passenger_flow_tag.plan_date
Recheck Cond: ((ext_passenger_flow_tag.line_no = 1) AND (ext_passenger_flow_tag.line_subno = 1) AND (ext_passenger_flow_tag.updown = 1) AND (ext_passenger_flow_tag.plan_date >= '2019-01-31'::date) AND (ext_passenger_flow_tag.plan_date <= '2019-02-20'::date) AND (ext_passenger_flow_tag.interval_flag = 2))
Heap Blocks: exact=86
Buffers: shared hit=372
-> Bitmap Index Scan on ext_pf_tag_index1 (cost=0.00..2172.80 rows=7790 width=0) (actual time=9.876..9.876 rows=7899 loops=1)
Index Cond: ((ext_passenger_flow_tag.line_no = 1) AND (ext_passenger_flow_tag.line_subno = 1) AND (ext_passenger_flow_tag.updown = 1) AND (ext_passenger_flow_tag.plan_date >= '2019-01-31'::date) AND (ext_passenger_flow_tag.plan_date <= '2019-02-20'::date) AND (ext_passenger_flow_tag.interval_flag = 2))
Buffers: shared hit=286
Planning time: 0.533 ms
Execution time: 19.450 ms
我不知道如何优化sql。 我的目标是查询sql运行不到100毫秒。
答案 0 :(得分:0)
查询所用时间不到100毫秒。
首先,简化您的SQL语句:
SELECT line_no,
CASE WHEN holiday_flag = 2
THEN 0
ELSE holiday_flag
END AS holiday_flag,
round(CAST(sum(passenger_up)
AS double precision) /
CAST(count(DISTINCT plan_date)
AS double precision)) AS passenger_up
FROM ext_passenger_flow_tag
WHERE line_no = 1
AND plan_date BETWEEN DATE '2019-01-31' AND DATE '2019-02-20'
GROUP BY line_no, holiday_flag;
我注意到执行计划是针对另一个查询的,您能解决这个问题吗?
在不知道实际执行计划的情况下很难提供进一步的建议。
我还想知道索引的定义,看看它是否是最佳的。