PostgreSQL查询随机需要几分钟才能执行

时间:2018-01-03 12:10:52

标签: postgresql query-optimization

多次(顺序)运行相同的SELECT查询时,大约10%的查询需要几分钟才能执行(3到20分钟之间)。
所有其他人在大约100毫秒内执行 当运行长持续时间查询时,postgreSQL进程需要100%cpu。

我们在Debian GNU / Linux 8.7(jessie)上运行postgreSQL v9.4。
每天在DB上运行VACUUM ANALYZE。

在服务器端启用日志记录后,我们会看到以下条目:

成功案例:

2018-01-02 21:41:56 CET [15948]: [13776-1] STATEMENT:  BEGIN
2018-01-02 21:41:56 CET [15948]: [13777-1] LOG:  rewritten parse tree:
2018-01-02 21:41:56 CET [15948]: [13778-1] DETAIL:  (
        ... )
2018-01-02 21:41:56 CET [15948]: [13779-1] STATEMENT:  BEGIN
2018-01-02 21:41:56 CET [15948]: [13780-1] LOG:  duration: 0.114 ms  parse <unnamed>: BEGIN
2018-01-02 21:41:56 CET [15948]: [13781-1] LOG:  duration: 0.010 ms  bind <unnamed>: BEGIN
2018-01-02 21:41:56 CET [15948]: [13782-1] LOG:  execute <unnamed>: BEGIN
2018-01-02 21:41:56 CET [15948]: [13783-1] LOG:  duration: 0.015 ms
2018-01-02 21:41:56 CET [15948]: [13797-1] LOG:  plan:
2018-01-02 21:41:56 CET [15948]: [13798-1] DETAIL:     {PLANNEDSTMT 
        ...
2018-01-02 21:41:56 CET [15948]: [13799-1] STATEMENT:  select 
        ... (the SELECT query)
2018-01-02 21:41:56 CET [15948]: [13800-1] LOG:  duration: 10.303 ms  bind S_21/C_24: select 
        ... (the SELECT query again)
2018-01-02 21:41:56 CET [15948]: [13801-1] DETAIL:  parameters: $1 = '149', $2 = '6'...(the query parameters)
2018-01-02 21:41:56 CET [15948]: [13802-1] LOG:  execute S_21/C_24: select 
        ... (the SELECT query again)
2018-01-02 21:41:56 CET [15948  ]: [13803-1] DETAIL:  parameters: $1 = '149', $2 = '6'...(the query parameters again)
2018-01-02 21:41:56 CET [15948]: [13804-1] LOG:  duration: 15.662 ms

失败的情况(长时间查询):

2018-01-02 21:36:55 CET [15741]: [13060-1] STATEMENT:  BEGIN
2018-01-02 21:36:55 CET [15741]: [13061-1] LOG:  rewritten parse tree:
2018-01-02 21:36:55 CET [15741]: [13062-1] DETAIL:  (
        ... )
2018-01-02 21:36:55 CET [15741]: [13063-1] STATEMENT:  BEGIN
2018-01-02 21:36:55 CET [15741]: [13064-1] LOG:  duration: 0.107 ms  parse <unnamed>: BEGIN
2018-01-02 21:36:55 CET [15741]: [13065-1] LOG:  duration: 0.009 ms  bind <unnamed>: BEGIN
2018-01-02 21:36:55 CET [15741]: [13066-1] LOG:  execute <unnamed>: BEGIN
2018-01-02 21:36:55 CET [15741]: [13067-1] LOG:  duration: 0.016 ms
2018-01-02 21:36:55 CET [15741]: [13081-1] LOG:  plan:
2018-01-02 21:36:55 CET [15741]: [13082-1] DETAIL:     {PLANNEDSTMT 
        ...
2018-01-02 21:36:55 CET [15741]: [13083-1] STATEMENT:  select 
        ... (the SELECT query)
2018-01-02 21:36:55 CET [15741]: [13084-1] LOG:  duration: 9.886 ms  bind S_20/C_27: select 
        ... (the SELECT query again)
2018-01-02 21:36:55 CET [15741]: [13085-1] DETAIL:  parameters: $1 = '149', $2 = '6'...(the query parameters)
2018-01-02 21:36:55 CET [15741]: [13086-1] LOG:  execute S_20/C_27: select 
    ... (the SELECT query again)
2018-01-02 21:36:55 CET [15741]: [13087-1] DETAIL:  parameters: $1 = '149', $2 = '6'...(the query parameters again)

**/!\--- LOOPING ABOUT 150 times ---/!\**
        2018-01-02 21:42:23 CET [15741]: [13088-1] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15741.XX", size 9498636 (this value changes at every loop)
        2018-01-02 21:42:23 CET [15741]: [13089-1] STATEMENT:  select 
            ... (the SELECT query again)
**/!\--- END LOOP ---/!\**

2018-01-02 21:46:19 CET [15741]: [13340-1] LOG:  duration: 563668.493 ms
2018-01-02 21:46:19 CET [15741]: [13341-1] LOG:  duration: 0.026 ms  bind S_2: COMMIT
2018-01-02 21:46:19 CET [15741]: [13342-1] LOG:  execute S_2: COMMIT
2018-01-02 21:46:19 CET [15741]: [13343-1] LOG:  duration: 0.155 ms
2018-01-02 21:46:19 CET [15741]: [13344-1] LOG:  disconnection: session time: 0:10:23.538 user=xxx database=yyy host=127.0.0.1 port=60880                    

在查看执行计划时,我们看到:
     - 所有OK查询都有相同的计划
     - 所有NOK(持续时间长)查询具有相同的计划
     - OK和NOK计划略有不同:

        'total_cost' values (near but different values)
        ...
        {TARGETENTRY 
            :expr 
                {VAR 
                :varno 65000 ---------> differs
                :varattno 3  ---------> differs
                :vartype 20 
                :vartypmod -1 
                :varcollid 0 
                :varlevelsup 0 
                :varnoold 10  ---------> differs
                :varoattno 1 
                :location 2459 ---------> differs
                }
        ... 
        :args (...
        {PARAM | RELABELTYPE | CONST ---------> differs, always CONST for OK queries, always PARAM | RELABELTYPE for NOK queries
        ...
        }

添加了附加信息:
- Autovacuum关闭(每晚安排)
- 已运行pg_stat_reset()+ ANALYZE(pbm保持)
- 主表统计数据已增加(500),但pbm仍然存在。

此pbm发生在Alfresco(5.1g)环境中(使用alfresco seach API),但我们无法使用直接sql查询重现此pbm。

为这样的SELECT查询使用如此多的临时文件是否有意义?

查询:

select
node.id as id 
from alf_node node 
where node.type_qname_id <> 149 
AND node.store_id = 6 
AND ( 
node.id IN 
    (
    select aspect.node_id 
    from alf_node_aspects aspect 
    where aspect.qname_id IN ( 260 )
    ) 
AND node.id IN 
    (
    select PROP.node_id 
    from alf_node_properties PROP 
    where (249 = PROP.qname_id) 
        AND PROP.string_value = 'Mandats' 
    ) 
AND node.id IN 
    (
    select PROP.node_id 
    from alf_node_properties PROP 
    where (245 = PROP.qname_id) 
        AND PROP.string_value = '1' 
    ) 
AND node.id IN 
    (
    select PROP.node_id 
    from alf_node_properties PROP 
    where (247 = PROP.qname_id) 
        AND PROP.string_value = '869637' 
    ) 
AND node.id IN 
    (
    select PROP.node_id 
    from alf_node_properties PROP 
    where (248 = PROP.qname_id) 
        AND PROP.string_value = 'AGF00619' 
    )
) 
order by node.audit_modified DESC;

有什么建议吗? 谢谢, 文森特

1 个答案:

答案 0 :(得分:0)

你试过吗?:选择pg_stat_reset();