多次(顺序)运行相同的SELECT查询时,大约10%的查询需要几分钟才能执行(3到20分钟之间)。
所有其他人在大约100毫秒内执行
当运行长持续时间查询时,postgreSQL进程需要100%cpu。
我们在Debian GNU / Linux 8.7(jessie)上运行postgreSQL v9.4。
每天在DB上运行VACUUM ANALYZE。
在服务器端启用日志记录后,我们会看到以下条目:
成功案例:
2018-01-02 21:41:56 CET [15948]: [13776-1] STATEMENT: BEGIN
2018-01-02 21:41:56 CET [15948]: [13777-1] LOG: rewritten parse tree:
2018-01-02 21:41:56 CET [15948]: [13778-1] DETAIL: (
... )
2018-01-02 21:41:56 CET [15948]: [13779-1] STATEMENT: BEGIN
2018-01-02 21:41:56 CET [15948]: [13780-1] LOG: duration: 0.114 ms parse <unnamed>: BEGIN
2018-01-02 21:41:56 CET [15948]: [13781-1] LOG: duration: 0.010 ms bind <unnamed>: BEGIN
2018-01-02 21:41:56 CET [15948]: [13782-1] LOG: execute <unnamed>: BEGIN
2018-01-02 21:41:56 CET [15948]: [13783-1] LOG: duration: 0.015 ms
2018-01-02 21:41:56 CET [15948]: [13797-1] LOG: plan:
2018-01-02 21:41:56 CET [15948]: [13798-1] DETAIL: {PLANNEDSTMT
...
2018-01-02 21:41:56 CET [15948]: [13799-1] STATEMENT: select
... (the SELECT query)
2018-01-02 21:41:56 CET [15948]: [13800-1] LOG: duration: 10.303 ms bind S_21/C_24: select
... (the SELECT query again)
2018-01-02 21:41:56 CET [15948]: [13801-1] DETAIL: parameters: $1 = '149', $2 = '6'...(the query parameters)
2018-01-02 21:41:56 CET [15948]: [13802-1] LOG: execute S_21/C_24: select
... (the SELECT query again)
2018-01-02 21:41:56 CET [15948 ]: [13803-1] DETAIL: parameters: $1 = '149', $2 = '6'...(the query parameters again)
2018-01-02 21:41:56 CET [15948]: [13804-1] LOG: duration: 15.662 ms
失败的情况(长时间查询):
2018-01-02 21:36:55 CET [15741]: [13060-1] STATEMENT: BEGIN
2018-01-02 21:36:55 CET [15741]: [13061-1] LOG: rewritten parse tree:
2018-01-02 21:36:55 CET [15741]: [13062-1] DETAIL: (
... )
2018-01-02 21:36:55 CET [15741]: [13063-1] STATEMENT: BEGIN
2018-01-02 21:36:55 CET [15741]: [13064-1] LOG: duration: 0.107 ms parse <unnamed>: BEGIN
2018-01-02 21:36:55 CET [15741]: [13065-1] LOG: duration: 0.009 ms bind <unnamed>: BEGIN
2018-01-02 21:36:55 CET [15741]: [13066-1] LOG: execute <unnamed>: BEGIN
2018-01-02 21:36:55 CET [15741]: [13067-1] LOG: duration: 0.016 ms
2018-01-02 21:36:55 CET [15741]: [13081-1] LOG: plan:
2018-01-02 21:36:55 CET [15741]: [13082-1] DETAIL: {PLANNEDSTMT
...
2018-01-02 21:36:55 CET [15741]: [13083-1] STATEMENT: select
... (the SELECT query)
2018-01-02 21:36:55 CET [15741]: [13084-1] LOG: duration: 9.886 ms bind S_20/C_27: select
... (the SELECT query again)
2018-01-02 21:36:55 CET [15741]: [13085-1] DETAIL: parameters: $1 = '149', $2 = '6'...(the query parameters)
2018-01-02 21:36:55 CET [15741]: [13086-1] LOG: execute S_20/C_27: select
... (the SELECT query again)
2018-01-02 21:36:55 CET [15741]: [13087-1] DETAIL: parameters: $1 = '149', $2 = '6'...(the query parameters again)
**/!\--- LOOPING ABOUT 150 times ---/!\**
2018-01-02 21:42:23 CET [15741]: [13088-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15741.XX", size 9498636 (this value changes at every loop)
2018-01-02 21:42:23 CET [15741]: [13089-1] STATEMENT: select
... (the SELECT query again)
**/!\--- END LOOP ---/!\**
2018-01-02 21:46:19 CET [15741]: [13340-1] LOG: duration: 563668.493 ms
2018-01-02 21:46:19 CET [15741]: [13341-1] LOG: duration: 0.026 ms bind S_2: COMMIT
2018-01-02 21:46:19 CET [15741]: [13342-1] LOG: execute S_2: COMMIT
2018-01-02 21:46:19 CET [15741]: [13343-1] LOG: duration: 0.155 ms
2018-01-02 21:46:19 CET [15741]: [13344-1] LOG: disconnection: session time: 0:10:23.538 user=xxx database=yyy host=127.0.0.1 port=60880
在查看执行计划时,我们看到:
- 所有OK查询都有相同的计划
- 所有NOK(持续时间长)查询具有相同的计划
- OK和NOK计划略有不同:
'total_cost' values (near but different values)
...
{TARGETENTRY
:expr
{VAR
:varno 65000 ---------> differs
:varattno 3 ---------> differs
:vartype 20
:vartypmod -1
:varcollid 0
:varlevelsup 0
:varnoold 10 ---------> differs
:varoattno 1
:location 2459 ---------> differs
}
...
:args (...
{PARAM | RELABELTYPE | CONST ---------> differs, always CONST for OK queries, always PARAM | RELABELTYPE for NOK queries
...
}
添加了附加信息:
- Autovacuum关闭(每晚安排)
- 已运行pg_stat_reset()+ ANALYZE(pbm保持)
- 主表统计数据已增加(500),但pbm仍然存在。
此pbm发生在Alfresco(5.1g)环境中(使用alfresco seach API),但我们无法使用直接sql查询重现此pbm。
为这样的SELECT查询使用如此多的临时文件是否有意义?
查询:
select
node.id as id
from alf_node node
where node.type_qname_id <> 149
AND node.store_id = 6
AND (
node.id IN
(
select aspect.node_id
from alf_node_aspects aspect
where aspect.qname_id IN ( 260 )
)
AND node.id IN
(
select PROP.node_id
from alf_node_properties PROP
where (249 = PROP.qname_id)
AND PROP.string_value = 'Mandats'
)
AND node.id IN
(
select PROP.node_id
from alf_node_properties PROP
where (245 = PROP.qname_id)
AND PROP.string_value = '1'
)
AND node.id IN
(
select PROP.node_id
from alf_node_properties PROP
where (247 = PROP.qname_id)
AND PROP.string_value = '869637'
)
AND node.id IN
(
select PROP.node_id
from alf_node_properties PROP
where (248 = PROP.qname_id)
AND PROP.string_value = 'AGF00619'
)
)
order by node.audit_modified DESC;
有什么建议吗? 谢谢, 文森特
答案 0 :(得分:0)
你试过吗?:选择pg_stat_reset();