我可以优化此查询,还是修改表结构以缩短执行时间?我真的不明白EXPLAIN
的输出。我错过了一些索引吗?
EXPLAIN SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id *emphasized text*
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv')
GROUP BY q.query_str
ORDER BY count DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=19170.15..19188.80 rows=7460 width=12)
Sort Key: (count(*))
-> HashAggregate (cost=18597.03..18690.28 rows=7460 width=12)
-> Nested Loop (cost=10.20..18559.73 rows=7460 width=12)
-> Nested Loop (cost=10.20..14975.36 rows=2452 width=20)
Join Filter: (qpd.interface_id = interface.interface_id)
-> Unique (cost=1.03..1.04 rows=1 width=4)
-> Sort (cost=1.03..1.04 rows=1 width=4)
Sort Key: interface.interface_id
-> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4)
Filter: (lang = 'sv'::text)
-> Nested Loop (cost=9.16..14943.65 rows=2452 width=24)
-> Hash Join (cost=9.16..14133.58 rows=2452 width=8)
Hash Cond: (qpd.date_dim_id = dd.date_dim_id)
-> Seq Scan on queries_p_day_mv qpd (cost=0.00..11471.93 rows=700793 width=12)
-> Hash (cost=8.81..8.81 rows=28 width=4)
-> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4)
Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date))
-> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16)
Index Cond: (q.query_id = qpd.query_id)
-> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4)
Index Cond: (cf.query_id = qpd.query_id)
Filter: (cf.type = 'S'::bpchar)
更新了EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id IN (SELECT DISTINCT interface_id from interface WHERE lang = 'sv')
GROUP BY q.query_str
ORDER BY count DESC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=19201.06..19220.52 rows=7784 width=12) (actual time=51017.162..51046.102 rows=17586 loops=1)
Sort Key: (count(*))
Sort Method: external merge Disk: 632kB
-> HashAggregate (cost=18600.67..18697.97 rows=7784 width=12) (actual time=50935.411..50968.678 rows=17586 loops=1)
-> Nested Loop (cost=10.20..18561.75 rows=7784 width=12) (actual time=42.079..43666.404 rows=3868592 loops=1)
-> Nested Loop (cost=10.20..14975.91 rows=2453 width=20) (actual time=23.678..14609.282 rows=700803 loops=1)
Join Filter: (qpd.interface_id = interface.interface_id)
-> Unique (cost=1.03..1.04 rows=1 width=4) (actual time=0.104..0.110 rows=1 loops=1)
-> Sort (cost=1.03..1.04 rows=1 width=4) (actual time=0.100..0.102 rows=1 loops=1)
Sort Key: interface.interface_id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on interface (cost=0.00..1.02 rows=1 width=4) (actual time=0.038..0.041 rows=1 loops=1)
Filter: (lang = 'sv'::text)
-> Nested Loop (cost=9.16..14944.20 rows=2453 width=24) (actual time=23.550..12553.786 rows=700808 loops=1)
-> Hash Join (cost=9.16..14133.80 rows=2453 width=8) (actual time=18.283..3885.700 rows=700808 loops=1)
Hash Cond: (qpd.date_dim_id = dd.date_dim_id)
-> Seq Scan on queries_p_day_mv qpd (cost=0.00..11472.08 rows=700808 width=12) (actual time=0.014..1587.106 rows=700808 loops=1)
-> Hash (cost=8.81..8.81 rows=28 width=4) (actual time=18.221..18.221 rows=31 loops=1)
-> Index Scan using date_dim_pg_date_index on date_dim dd (cost=0.00..8.81 rows=28 width=4) (actual time=14.388..18.152 rows=31 loops=1)
Index Cond: ((pg_date >= '2010-12-29'::date) AND (pg_date <= '2011-01-28'::date))
-> Index Scan using query_pkey on query q (cost=0.00..0.32 rows=1 width=16) (actual time=0.005..0.006 rows=1 loops=700808)
Index Cond: (q.query_id = qpd.query_id)
-> Index Scan using click_fact_query_id_index on click_fact cf (cost=0.00..1.01 rows=36 width=4) (actual time=0.005..0.022 rows=6 loops=700803)
Index Cond: (cf.query_id = qpd.query_id)
Filter: (cf.type = 'S'::bpchar)
答案 0 :(得分:1)
您可以尝试消除子查询:
SELECT COUNT(*) AS count,
q.query_str
FROM click_fact cf,
query q,
date_dim dd,
queries_p_day_mv qpd
WHERE dd.date_dim_id = qpd.date_dim_id
AND qpd.query_id = q.query_id
AND type = 'S'
AND cf.query_id = q.query_id
AND dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
AND qpd.interface_id = interface.interface_id
AND interface.lang = 'sv'
GROUP BY q.query_str
ORDER BY count DESC;
此外,如果接口表很大,在lang上创建ingex可能会有所帮助。 day_dim_id中的queries_p_day_mv中的索引也可能有所帮助。
通常,首先要尝试的是查找Seq Scans并尝试通过创建索引使它们成为索引扫描。
HTH
答案 1 :(得分:1)
SELECT COUNT(*) AS count,
q.query_str
FROM date_dim dd
JOIN queries_p_date_mv qpd
ON qpd.date_dim_id = dd.date_dim_id
AND qpd.interface_id IN
(
SELECT interface_id
FROM interface
WHERE lang = 'sv'
)
JOIN query q
ON q.query_id = qpd.query_id
JOIN click_fact cf
ON cf.query_id = q.query_id
AND cf.type = 'S'
WHERE dd.pg_date BETWEEN '2010-12-29' AND '2011-01-28'
GROUP BY
q.query_str
ORDER BY
count DESC
创建以下索引(除现有索引外):
queries_p_date_mv (interface_id, date_dim_id)
interface (lang)
click_fact (query_id, type)
您能否发布表格的定义?