考虑有一个作业运行历史表,其中包含以下模式:
job_runs
(
run_id integer not null, -- identifier of the run
job_id integer not null, -- identifier of the job
run_number integer not null, -- job run number, run numbers increment for each job
status text not null, -- status of the run (running, completed, killed, ...)
primary key (run_id)
-- ...
)
并且每个作业需要获得status != 'running'
的最后10次运行(作业相差job_id
)。为此,我写了以下查询:
SELECT
*
FROM
job_runs AS JR1
WHERE
JR1.run_number IN
(
SELECT
JR2.run_number
FROM
job_runs AS JR2
WHERE
JR2.job_id = JR1.job_id
AND
JR2.status != 'running'
ORDER BY
JR2.run_number
DESC
LIMIT
10
)
它可以满足我的需要,但即使job_id
表的run_num
和job_runs
字段上有多字段索引,查询也很慢,因为它会扫描job_runs表和每个行都运行子查询。索引有助于子查询每次都快速运行,但是nester查询扫描整个表的事实会导致性能下降。那么如何调整查询的性能呢?
一些想法:
作业数量(不同job_id
s)很小,如果在SQLite中有一个FOR循环,则很容易遍历所有不同的job_id
并运行子查询
传递作业ID而不是JR1.job_id
,然后UNION全部结果。
重要:
请不要建议在我的应用程序的源代码中运行循环。我需要纯SQL解决方案。
答案 0 :(得分:1)
您可以通过为其创建covering index来进一步提高子查询的性能:
CREATE INDEX xxx ON job_runs(job_id, run_number, status);
但最大的性能问题是每个行都会执行子查询,尽管您只需要为每个唯一的作业ID运行它。
首先,只获取唯一的职位ID:
SELECT DISTINCT job_id
FROM job_runs
然后,对于每个ID,确定第十个最大运行编号:
SELECT job_id,
(SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
但是如果作业的运行编号少于10,则子查询返回NULL,所以让我们用一个小数字替换它,以便下面的比较(run_number >= first_run_number
)有效:
SELECT job_id,
IFNULL((SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
所以现在我们为每项工作进行了第一次有趣的运行。 最后,将这些值连接回原始表:
SELECT job_runs.*
FROM job_runs
JOIN (SELECT job_id,
IFNULL((SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
) AS firsts
ON job_runs.job_id = firsts.job_id
AND job_runs.run_number >= firsts.first_run_number;