DB2中WITH查询的SQL查询性能改进

时间:2018-07-11 18:47:50

标签: sql performance db2

示例中给出的查询运行非常缓慢。我已经关闭my_task表中的400万条记录。

  

我们可以对此进行任何形式的性能改进吗?

以下表为例,

在这里我放置了数字start_dtend_dt,而不是timestamp格式。

end_dt为空的附加注释表示该记录是活动记录,正在由工作人员处理。

T_ID |start_dt |end_dt |code       |p_id
-----|---------|-------|-----------|---
1    |8        |4      |INPROGRESS |110
1    |4        |       |ASSIGNED   |110
4    |10       |4      |INPROGRESS |110
4    |4        |       |ASSIGNED   |110
5    |4        |4      |INPROGRESS |110
6    |12       |12     |INPROGRESS |110
6    |8        |8      |ASSIGNED   |110
6    |8        |       |DONE       |110
2    |12       |12     |INPROGRESS |210
2    |8        |8      |ASSIGNED   |210
2    |8        |       |DONE       |210
3    |12       |12     |INPROGRESS |111

输出看起来像

P_ID |avg_bgn_diff |assigned |in_progress |completed | comp_diff
-----|-------------|---------|------------|----------|----------
110  | 4           |   2     |    1       |     1    |      10
210  | null        |   0     |    0       |     1    |      8
111  | null        |   0     |    1       |     0    |      null

输出解释:我用虚构的名称表ref掩盖了原始查询,我可能为此表示歉意。

  • MY_TASK表具有唯一的T_ID
  • MY_PEOPLE表是员工表
  • MY_TASK_REF表包含有关谁有任务的详细信息
  • 任务具有状态,因为每个状态更改操作都将结果存储到任务表中的记录中。 ASSIGNED,INPROGRESS和DONE等雕像
  • 现在END_DT不存在的地方表示活动记录
  • 我们仅想查找所有“ ASSIGNED”任务的平均输出时间avg_bgn_diff(平均END_DT为空)
  • 此字段assigned |in_progress |completed代表每个员工在每个类别中有多少活动任务。
  • 找到每位员工的平均comp_diff完成时间。当记录进入INPROGRESS时,员工便开始工作。而且,我们以今天完成的状态来完成平均任务。我们获得了INPROGRESS的开始日期和DONE的开始日期。

我有以下查询,

WITH a AS (
    SELECT
        t1.t_id AS t_id,
        t1.start_dt AS start_dt,
        t1.end_dt AS end_dt,
        t1.code AS code,
        t2.p_id AS p_id
    FROM
        my_task t2
        INNER JOIN my_task_ref t1 ON t1.t_id = t2.t_id
        INNER JOIN my_people p1 ON t2.p_id = p1.p_id
    WHERE
        -- ignore DONE tasks
        t1.t_id NOT IN (
            SELECT t.t_id
            FROM my_task t
            WHERE t.code = 'DONE' AND trunc(t.execution_dt) < trunc(current_timestamp)
        )
        and p1.department_id = '1234' 
    ORDER BY p_id DESC
) SELECT
    d.p_id,
    d.avg_bgn_diff
    ,e.assigned
    ,e.in_progress
    ,e.completed
    ,g.comp_diff
  FROM
  `-- find average time for persons for diff ASSIGNMENT
    (
        SELECT c.p_id,AVG(c.bgn_diff) AS avg_bgn_diff
        FROM(
                SELECT b.p_id,timestampdiff(4,current_timestamp - a.start_dt) AS bgn_diff
                FROM ( SELECT p_id,t_id,start_dt FROM a WHERE end_dt IS NULL ) b
                LEFT OUTER JOIN  ( SELECT p_id, t_id,start_dt FROM a WHERE 
                     code = 'ASSIGNED' AND   end_dt IS NULL ) x ON x.p_id = b.p_id
            ) c  GROUP BY C.p_id
    ) d
    -- find count of each codes person has
    INNER JOIN (
        SELECT 
            p_id,
            SUM( CASE WHEN code = 'ASSIGNED' THEN 1 ELSE 0 END ) AS assigned,
            SUM( CASE WHEN code = 'INPROGRESS' THEN 1 ELSE 0 END ) AS in_progress,
            SUM( CASE WHEN code = 'DONE' AND trunc(start_dt) = trunc(current_timestamp)
                    THEN 1 ELSE 0 END ) AS completed
        FROM
            a where end_dt IS NULL
        GROUP BY p_id
    ) e on D.p_id=E.p_id 
    -- find total avg diff of entire task took to compelete.
    LEFT OUTER JOIN (
        SELECT F.p_id,AVG(f.bgn_diff) AS comp_diff
        FROM
            (
                SELECT a.p_id, timestampdiff(4,b.start_dt - a.start_dt) AS bgn_diff
                FROM (
                        SELECT p_id, t_id, start_dt FROM a WHERE code = 'INPROGRESS'
                    ) a
                    INNER JOIN (
                        SELECT p_id, t_id, start_dt FROM a
                        WHERE code = 'DONE' AND   trunc(start_dt) = trunc(current_timestamp)
                    ) b ON a.t_id = b.t_id
            ) f GROUP BY F.p_id
    ) g ON D.p_id=G.p_id
WITH
ur;

我们可以用不同的方式来写这可以提高性能吗?

注意:索引出现在所有必要的列中。

谢谢。

2 个答案:

答案 0 :(得分:0)

如果您提供了一个查询EXPLAIN计划,一个索引列表,并且也许可以更好地说明您要执行的操作(并且更正了表引用的语法错误,{ {1}}),但此版本的查询可能会加快速度。

请注意整个注释!

c

答案 1 :(得分:-1)

尝试在第一个查询中删除ORDER BY p_id DESC,通常ORDER BY非常昂贵。同样在第一个查询中,NOT IN似乎正在查看同一基表my_task,因此,我建议将过滤器放在WHERE子句中。

WITH a AS (
SELECT
    t1.t_id AS t_id,
    t1.start_dt AS start_dt,
    t1.end_dt AS end_dt,
    t1.code AS code,
    t2.p_id AS p_id
FROM
    my_task t2
    INNER JOIN my_task_ref t1 ON t1.t_id = t2.t_id
    INNER JOIN my_people p1 ON t2.p_id = p1.p_id
WHERE
    -- ignore DONE tasks
    t2.code <> 'DONE' AND trunc(t2.execution_dt) < trunc(current_timestamp)
    and p1.department_id = '1234' )

此外,尝试减小子查询的深度/数量也将是一件好事。 像

 SELECT c.p_id,AVG(c.bgn_diff) AS avg_bgn_diff
    FROM(
            SELECT b.p_id,timestampdiff(4,current_timestamp - a.start_dt) AS bgn_diff
            FROM ( SELECT p_id,t_id,start_dt FROM a WHERE end_dt IS NULL ) b
            LEFT OUTER JOIN  ( SELECT p_id, t_id,start_dt FROM a WHERE 
                 code = 'ASSIGNED' AND   end_dt IS NULL ) x ON x.p_id = b.p_id
        ) c  GROUP BY C.p_id

可能会变成...

SELECT a.p_id,AVG(timestampdiff(4,current_timestamp - a.start_dt)) AS 
avg_bgn_diff
FROM a
WHERE end_dt IS NULL OR (code = 'ASSIGNED' AND end_dt IS NULL )
GROUP BY a.p_id