以下是我发现的查询非常慢。
基本上我试图通过加入这三个表来获取每个workflow_id的最新状态以及其他信息。
查询逻辑如下:
这是我目前的代码:
SELECT workflow_id,
collabration_key,
TAB3.START_TIME AS WORKFLOWDATE,
batch_step_execution.STEP_NAME AS CURRENT_STEP_NAME ,
batch_step_execution.EXIT_CODE AS CURRENTSTEP ,
batch_step_execution.start_time AS STEPTIME ,
TAB3.EXIT_CODE AS JOB_STATUS
FROM batch_step_execution
INNER JOIN (
SELECT *
FROM rpx_id_mapping
INNER JOIN (
SELECT batch_job_execution.job_execution_id,
batch_job_execution.job_instance_id ,
batch_job_execution.START_TIME ,
batch_job_execution.EXIT_CODE
FROM batch_job_execution
WHERE batch_job_execution.job_execution_id IN (
SELECT MAX(job_execution_id)
FROM batch_job_execution
WHERE job_instance_id IN (
SELECT job_id
FROM rpx_id_mapping
)
GROUP BY job_instance_id
)
) TAB2
ON rpx_id_mapping.job_id = TAB2.job_instance_id
) TAB3
ON batch_step_execution.job_execution_id = TAB3.job_execution_id
WHERE batch_step_execution.step_execution_id = (
SELECT MAX(step_execution_id)
FROM batch_step_execution
WHERE batch_step_execution.job_execution_id = TAB3.job_execution_id
)
) TAB4
这是表结构。
有没有更好的方法来实现同样的目标?
答案 0 :(得分:2)
- 按作业执行ID和检索的表批处理步骤执行分组 最新步骤使用max(step_execution_id),然后检索所有 使用步骤执行ID的列
- 按表batch_job_execution分组 job_instance_id并检索最新执行。
- 内部联接1和2基于最新步骤的最新执行和作业执行加入
- 加入id映射表
醇>
我试图遵循你的逻辑并认为使用分析查询获得最大行是相同的:
SELECT *
FROM (
SELECT workflow_id,
collabration_key,
bse.START_TIME AS WORKFLOWDATE,
bse.STEP_NAME AS CURRENT_STEP_NAME ,
bse.EXIT_CODE AS CURRENTSTEP ,
bse.start_time AS STEPTIME ,
bse.EXIT_CODE AS JOB_STATUS,
ROW_NUMBER() OVER ( PARTITION BY rim.job_execution_id,
rim.job_id
ORDER BY bse.step_execution_id DESC,
bse.job_execution_id DESC ) AS rn
FROM batch_step_execution bse
INNER JOIN rpx_id_mapping rim
ON ( bse.job_execution_id = rim.job_execution_id)
INNER JOIN batch_job_execution bje
ON ( rim.job_id = bje.job_instance_id )
)
WHERE rn = 1;
如果不是,那么希望它可以让你了解如何简化事情。