示例中给出的查询运行非常缓慢。我已经关闭my_task
表中的400万条记录。
我们可以对此进行任何形式的性能改进吗?
以下表为例,
在这里我放置了数字start_dt
和end_dt
,而不是timestamp
格式。
end_dt
为空的附加注释表示该记录是活动记录,正在由工作人员处理。
T_ID |start_dt |end_dt |code |p_id
-----|---------|-------|-----------|---
1 |8 |4 |INPROGRESS |110
1 |4 | |ASSIGNED |110
4 |10 |4 |INPROGRESS |110
4 |4 | |ASSIGNED |110
5 |4 |4 |INPROGRESS |110
6 |12 |12 |INPROGRESS |110
6 |8 |8 |ASSIGNED |110
6 |8 | |DONE |110
2 |12 |12 |INPROGRESS |210
2 |8 |8 |ASSIGNED |210
2 |8 | |DONE |210
3 |12 |12 |INPROGRESS |111
输出看起来像
P_ID |avg_bgn_diff |assigned |in_progress |completed | comp_diff
-----|-------------|---------|------------|----------|----------
110 | 4 | 2 | 1 | 1 | 10
210 | null | 0 | 0 | 1 | 8
111 | null | 0 | 1 | 0 | null
输出解释:我用虚构的名称表ref掩盖了原始查询,我可能为此表示歉意。
avg_bgn_diff
(平均END_DT为空)assigned |in_progress |completed
代表每个员工在每个类别中有多少活动任务。comp_diff
完成时间。当记录进入INPROGRESS时,员工便开始工作。而且,我们以今天完成的状态来完成平均任务。我们获得了INPROGRESS的开始日期和DONE的开始日期。我有以下查询,
WITH a AS (
SELECT
t1.t_id AS t_id,
t1.start_dt AS start_dt,
t1.end_dt AS end_dt,
t1.code AS code,
t2.p_id AS p_id
FROM
my_task t2
INNER JOIN my_task_ref t1 ON t1.t_id = t2.t_id
INNER JOIN my_people p1 ON t2.p_id = p1.p_id
WHERE
-- ignore DONE tasks
t1.t_id NOT IN (
SELECT t.t_id
FROM my_task t
WHERE t.code = 'DONE' AND trunc(t.execution_dt) < trunc(current_timestamp)
)
and p1.department_id = '1234'
ORDER BY p_id DESC
) SELECT
d.p_id,
d.avg_bgn_diff
,e.assigned
,e.in_progress
,e.completed
,g.comp_diff
FROM
`-- find average time for persons for diff ASSIGNMENT
(
SELECT c.p_id,AVG(c.bgn_diff) AS avg_bgn_diff
FROM(
SELECT b.p_id,timestampdiff(4,current_timestamp - a.start_dt) AS bgn_diff
FROM ( SELECT p_id,t_id,start_dt FROM a WHERE end_dt IS NULL ) b
LEFT OUTER JOIN ( SELECT p_id, t_id,start_dt FROM a WHERE
code = 'ASSIGNED' AND end_dt IS NULL ) x ON x.p_id = b.p_id
) c GROUP BY C.p_id
) d
-- find count of each codes person has
INNER JOIN (
SELECT
p_id,
SUM( CASE WHEN code = 'ASSIGNED' THEN 1 ELSE 0 END ) AS assigned,
SUM( CASE WHEN code = 'INPROGRESS' THEN 1 ELSE 0 END ) AS in_progress,
SUM( CASE WHEN code = 'DONE' AND trunc(start_dt) = trunc(current_timestamp)
THEN 1 ELSE 0 END ) AS completed
FROM
a where end_dt IS NULL
GROUP BY p_id
) e on D.p_id=E.p_id
-- find total avg diff of entire task took to compelete.
LEFT OUTER JOIN (
SELECT F.p_id,AVG(f.bgn_diff) AS comp_diff
FROM
(
SELECT a.p_id, timestampdiff(4,b.start_dt - a.start_dt) AS bgn_diff
FROM (
SELECT p_id, t_id, start_dt FROM a WHERE code = 'INPROGRESS'
) a
INNER JOIN (
SELECT p_id, t_id, start_dt FROM a
WHERE code = 'DONE' AND trunc(start_dt) = trunc(current_timestamp)
) b ON a.t_id = b.t_id
) f GROUP BY F.p_id
) g ON D.p_id=G.p_id
WITH
ur;
我们可以用不同的方式来写这可以提高性能吗?
注意:索引出现在所有必要的列中。
谢谢。
答案 0 :(得分:0)
如果您提供了一个查询EXPLAIN
计划,一个索引列表,并且也许可以更好地说明您要执行的操作(并且更正了表引用的语法错误,{ {1}}),但此版本的查询可能会加快速度。
请注意整个注释!
c
答案 1 :(得分:-1)
尝试在第一个查询中删除ORDER BY p_id DESC,通常ORDER BY非常昂贵。同样在第一个查询中,NOT IN似乎正在查看同一基表my_task,因此,我建议将过滤器放在WHERE子句中。
WITH a AS (
SELECT
t1.t_id AS t_id,
t1.start_dt AS start_dt,
t1.end_dt AS end_dt,
t1.code AS code,
t2.p_id AS p_id
FROM
my_task t2
INNER JOIN my_task_ref t1 ON t1.t_id = t2.t_id
INNER JOIN my_people p1 ON t2.p_id = p1.p_id
WHERE
-- ignore DONE tasks
t2.code <> 'DONE' AND trunc(t2.execution_dt) < trunc(current_timestamp)
and p1.department_id = '1234' )
此外,尝试减小子查询的深度/数量也将是一件好事。 像
SELECT c.p_id,AVG(c.bgn_diff) AS avg_bgn_diff
FROM(
SELECT b.p_id,timestampdiff(4,current_timestamp - a.start_dt) AS bgn_diff
FROM ( SELECT p_id,t_id,start_dt FROM a WHERE end_dt IS NULL ) b
LEFT OUTER JOIN ( SELECT p_id, t_id,start_dt FROM a WHERE
code = 'ASSIGNED' AND end_dt IS NULL ) x ON x.p_id = b.p_id
) c GROUP BY C.p_id
可能会变成...
SELECT a.p_id,AVG(timestampdiff(4,current_timestamp - a.start_dt)) AS
avg_bgn_diff
FROM a
WHERE end_dt IS NULL OR (code = 'ASSIGNED' AND end_dt IS NULL )
GROUP BY a.p_id