我有一个问题:
EXPLAIN ANALYZE
SELECT CAST(DATE(associationtime) AS text) AS date ,
cast(SUM(extract(epoch
FROM disassociationtime) - extract(epoch
FROM associationtime)) AS bigint) AS sessionduration,
cast(SUM(tx) AS bigint)AS tx,
cast(SUM(rx) AS bigint) AS rx,
cast(SUM(dataRetries) AS bigint) AS DATA,
cast(SUM(rtsRetries) AS bigint) AS rts,
count(*)
FROM SESSION
WHERE ssid_id=42
AND ap_id=1731
AND DATE(associationtime)>=DATE('Tue Nov 04 00:00:00 MSK 2014')
AND DATE(associationtime)<=DATE('Thu Nov 20 00:00:00 MSK 2014')
GROUP BY(DATE(associationtime))
ORDER BY DATE(associationtime);
输出结果为:
GroupAggregate (cost=0.44..17710.66 rows=1 width=32) (actual time=4.501..78.880 rows=17 loops=1)
-> Index Scan using session_lim_values_idx on session (cost=0.44..17538.94 rows=6868 width=32) (actual time=0.074..73.266 rows=7869 loops=1)
Index Cond: ((date(associationtime) >= '2014-11-04'::date) AND (date(associationtime) <= '2014-11-20'::date))
Filter: ((ssid_id = 42) AND (ap_id = 1731))
Rows Removed by Filter: 297425
Total runtime: 78.932 ms
看看这一行:
Index Scan using session_lim_values_idx
如您所见,查询使用三个字段进行扫描:ssid_id,ap_id和associationtime。我有一个指数:
ssid_pkey | btree | {id}
ap_pkey | btree | {id}
testingshit_pkey | btree | {one,two,three}
session_date_ssid_idx | btree | {ssid_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_pkey | btree | {associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretries,rtsretries }
session_main_idx | btree | {ssid_id,ap_id,associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretres,rtsretries}
session_date_idx | btree | {date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_apid_idx | btree | {ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_ssid_apid_idx | btree | {ssid_id,ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
ap_apname_idx | btree | {apname}
users_pkey | btree | {username}
user_roles_pkey | btree | {user_role_id}
session_lim_values_idx | btree | {date(associationtime)}
它被称为session_date_ssid_apid_idx
。但为什么查询使用错误的索引?
session_date_ssid_apid_idx:
------------+-----------------------------+-------------------------------------------
ssid_id | integer | ssid_id
ap_id | integer | ap_id
date | date | date(associationtime)
date_trunc | timestamp without time zone | date_trunc('hour'::text, associationtime)
session_lim_values_idx:
date | date | date(associationtime)
你会创建什么索引?
UPD: \d session
--------------------+-----------------------------+------------------------------------------------------
id | integer | NOT NULL DEFAULT nextval('session_id_seq'::regclass)
ssid_id | integer | NOT NULL
ap_id | integer | NOT NULL
associationtime | timestamp without time zone | NOT NULL
disassociationtime | timestamp without time zone | NOT NULL
sessionduration | character varying(100) | NOT NULL
clientip | character varying(100) | NOT NULL
clientmac | character varying(100) | NOT NULL
devicename | character varying(100) | NOT NULL
tx | integer | NOT NULL
rx | integer | NOT NULL
protocol | character varying(100) | NOT NULL
snr | integer | NOT NULL
rssi | integer | NOT NULL
dataretries | integer | NOT NULL
rtsretries | integer | NOT NULL
╚эфхъё√:
"session_pkey" PRIMARY KEY, btree (associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)
"session_date_ap_ssid_idx" btree (ssid_id, ap_id, associationtime)
"session_date_apid_idx" btree (ap_id, date(associationtime), date_trunc('hour'::text, associationtime))
"session_date_idx" btree (date(associationtime), date_trunc('hour'::text, associationtime))
"session_date_ssid_apid_idx" btree (ssid_id, ap_id, associationtime)
"session_date_ssid_idx" btree (ssid_id, date(associationtime), date_trunc('hour'::text, associationtime))
"session_lim_values_idx" btree (date(associationtime))
"session_main_idx" btree (ssid_id, ap_id, associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)
答案 0 :(得分:6)
ssid_id
和ap_id
的谓词中非常常见的值可以使Postgres选择较小的索引session_lim_values_idx
(仅1 date
列)看起来更便宜更合适,但更大的索引session_date_ssid_apid_idx
(4列)并过滤其余部分。
在您的情况下,大约4%的行有ssid_id=42 AND ap_id=1731
。这通常不应该保证切换到较小的索引。但是其他一些因素正在发挥作用,可能会使规模倾斜,基本上是成本设置和统计数据。详细说明:
如果您没有按linked the answer above中的建议那样调整费用设置。
提高相关列{,1}},ssid_id
的统计目标并运行ap_id
:
这里有一个特殊因素:Postgres为索引中的表达式收集单独的统计信息。检查:
ANALYZE
您将找到表达式SELECT * FROM pg_statistic
WHERE starelid = 'session_date_ssid_apid_idx'::regclass;
的专用行。更多细节:
删除第4列date(associationtime)
,使索引session_date_ssid_apid_idx
更具吸引力(更小)。看看你后来添加的表定义,你已经做到了。
我宁愿使用强制转换的标准语法:"date_trunc('hour'::text, associationtime)
而不是函数语法cast(associationtime AS date)
。不是说这很重要,我只是知道正常工作的标准方式。您可以在查询中使用与表达式索引兼容的简写语法date(associationtime)
,但在索引定义中使用详细形式。
此外,通过删除/重新创建您要测试的索引,使用associationtime::date
测试哪个查询计划实际上更快。然后你会看到Postgres是否选择了最好的计划。
你有很多索引,我会检查是否所有这些索引都已被使用并完成其余的索引。索引具有维护成本,如果可能的话,专注于更少的索引通常是有益的(更容易适应缓存并且可以在需要时缓存)。权衡成本与收益。
我会用:
EXPLAIN ANALYZE