在SQLite中获取每组最后10行的有效方法

时间:2015-11-02 17:58:17

标签: performance sqlite select

考虑有一个作业运行历史表,其中包含以下模式:

job_runs
(
    run_id integer not null, -- identifier of the run
    job_id integer not null, -- identifier of the job
    run_number integer not null, -- job run number, run numbers increment for each job
    status text not null, -- status of the run (running, completed, killed, ...)
    primary key (run_id)
    -- ...
)

并且每个作业需要获得status != 'running'的最后10次运行(作业相差job_id)。为此,我写了以下查询:

SELECT
    *
FROM
    job_runs AS JR1
WHERE
    JR1.run_number IN
    (
        SELECT
            JR2.run_number
        FROM
            job_runs AS JR2
        WHERE
            JR2.job_id = JR1.job_id
            AND
            JR2.status != 'running'
        ORDER BY
            JR2.run_number
        DESC
        LIMIT
            10
    )

它可以满足我的需要,但即使job_id表的run_numjob_runs字段上有多字段索引,查询也很慢,因为它会扫描job_runs表和每个行都运行子查询。索引有助于子查询每次都快速运行,但是nester查询扫描整个表的事实会导致性能下降。那么如何调整查询的性能呢?

一些想法:

作业数量(不同job_id s)很小,如果在SQLite中有一个FOR循环,则很容易遍历所有不同的job_id并运行子查询 传递作业ID而不是JR1.job_id,然后UNION全部结果。

重要:

请不要建议在我的应用程序的源代码中运行循环。我需要纯SQL解决方案。

1 个答案:

答案 0 :(得分:1)

您可以通过为其创建covering index来进一步提高子查询的性能:

CREATE INDEX xxx ON job_runs(job_id, run_number, status);

但最大的性能问题是每个行都会执行子查询,尽管您只需要为每个唯一的作业ID运行它。

首先,只获取唯一的职位ID:

SELECT DISTINCT job_id
FROM job_runs

然后,对于每个ID,确定第十个最大运行编号:

SELECT job_id,
       (SELECT run_number
        FROM job_runs
        WHERE job_id = job_ids.job_id
          AND status != 'running'
        ORDER BY run_number DESC
        LIMIT 1 OFFSET 9
       ) AS first_run_number
FROM (SELECT DISTINCT job_id
      FROM job_runs) AS job_ids

但是如果作业的运行编号少于10,则子查询返回NULL,所以让我们用一个小数字替换它,以便下面的比较(run_number >= first_run_number)有效:

SELECT job_id,
       IFNULL((SELECT run_number
               FROM job_runs
               WHERE job_id = job_ids.job_id
                 AND status != 'running'
               ORDER BY run_number DESC
               LIMIT 1 OFFSET 9
              ), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
      FROM job_runs) AS job_ids

所以现在我们为每项工作进行了第一次有趣的运行。 最后,将这些值连接回原始表:

SELECT job_runs.*
FROM job_runs
JOIN (SELECT job_id,
             IFNULL((SELECT run_number
                     FROM job_runs
                     WHERE job_id = job_ids.job_id
                       AND status != 'running'
                     ORDER BY run_number DESC
                     LIMIT 1 OFFSET 9
                    ), -1) AS first_run_number
      FROM (SELECT DISTINCT job_id
            FROM job_runs) AS job_ids
     ) AS firsts
  ON job_runs.job_id = firsts.job_id
 AND job_runs.run_number >= firsts.first_run_number;