使用生成的列表,列表和横向交叉连接来提高查询的性能

时间:2016-02-29 17:52:15

标签: postgresql

我有两个相关的表:人和轮班。 我的目标是在人员列表的时间范围内检索每天的班次。 这意味着即使在没有移位的情况下我想要获得日期的结果,person_id和其余的也可以为null。

people table:
id | name | deptartment_id
---|------|-------------
1  | max  | 1
2  | tim  | 1

shifts table:
id | date_of_shift | person_id
---|---------------|----------
1  | 2016-03-22    | 1
2  | 2016-03-23    | 1
3  | 2016-03-24    | 1
4  | 2016-03-21    | 2
5  | 2016-03-23    | 2
6  | 2016-03-25    | 2

这是我设法创建的查询:

SELECT p.id,
          p.name,
          json_agg(DISTINCT(shi)) as shifts

     FROM people as p
     JOIN LATERAL
            (SELECT d.date_of_shift,
                    pe.id as person_id,
                    sh.shift_id
               FROM generate_series('2016-03-21'::date, '2016-03-25', interval '1 day') AS d(date_of_shift)
CROSS JOIN LATERAL(
                    SELECT people.id
                      FROM people
                    ) AS pe
         LEFT JOIN( SELECT shifts.id as shift_id,
                           shifts.person_id,
                           shifts.date_of_shift
                     FROM shifts
                     ) as sh
                 ON d.date_of_shift = sh.date_of_shift AND sh.person_id = pe.id
           ) AS shi
       ON p.id = shi.person_id

    WHERE p.id IN (SELECT people.id
                        FROM people
                       WHERE people.department_id = 1
                    ORDER BY people.id ASC)
 GROUP BY p.id, p.name;

我希望得到的结果如下:

 id |   name   |                                      shifts                                      
----+----------+----------------------------------------------------------------------------------
  2 | person0  | [{"date_of_shift":"2016-03-21T00:00:00+01:00","person_id":2,"shift_id":null},   +
    |          |  {"date_of_shift":"2016-03-22T00:00:00+01:00","person_id":2,"shift_id":1027},   +
    |          |  {"date_of_shift":"2016-03-23T00:00:00+01:00","person_id":2,"shift_id":1028},   +
    |          |  {"date_of_shift":"2016-03-24T00:00:00+01:00","person_id":2,"shift_id":1029},   +
    |          |  {"date_of_shift":"2016-03-25T00:00:00+01:00","person_id":2,"shift_id":1030}]
  3 | person1  | [{"date_of_shift":"2016-03-21T00:00:00+01:00","person_id":3,"shift_id":1781},   +
    |          |  {"date_of_shift":"2016-03-22T00:00:00+01:00","person_id":3,"shift_id":1782},   +
    |          |  {"date_of_shift":"2016-03-23T00:00:00+01:00","person_id":3,"shift_id":1783},   +
    |          |  {"date_of_shift":"2016-03-24T00:00:00+01:00","person_id":3,"shift_id":1784},   +
    |          |  {"date_of_shift":"2016-03-25T00:00:00+01:00","person_id":3,"shift_id":1785}]

我遇到了多个导致这个非常慢的查询的问题

  1. 我想要每天换班,即使db没有转变
  2. 我有一份人员名单(某个部门的所有人员)
  3. 我创建了一个fiddle来显示问题。 根据我在dev db中的数据量,运行查询需要1秒。 对于将在我的SPA中显示主页的所有相关数据的查询,1秒并不可行。它更像是一个私人项目,但我真的很想知道如何更有效地解决问题。

    this将是EXPLAIN ANAYLIZE

    的结果

2 个答案:

答案 0 :(得分:0)

当我还在床上的时候,我有了一个突破。

人员表上的第二次加入是完全没必要的,我可以直接在生成的日期系列上进行LEFT JOIN转换

            SELECT p.id,
                   p.name,
                   json_agg(s) as shifts
              FROM people as p
CROSS JOIN LATERAL (SELECT d.date_of_shift, sh.id as shift_id
                      FROM generate_series('2016-03-21'::date, '2016-03-25', interval '1 day') AS d(date_of_shift)
         LEFT JOIN LATERAL (SELECT shifts.id, 
                                   shifts.person_id, 
                                   shifts.date_of_shift
                              FROM shifts
                           ) as sh
                        ON d.date_of_shift = sh.date_of_shift AND sh.person_id = p.id) as s
             WHERE p.id IN (SELECT people.id
                              FROM people
                             WHERE people.department_id = 1)
             GROUP BY p.id, p.name

查询现在需要25毫秒而不是980毫秒

答案 1 :(得分:0)

我会尝试使用公用表表达式(WITH子句)拆分查询,因为这确实提高了查询的易读性。你能看到这个查询运行得更快吗?

WITH shifts_per_person AS (
  SELECT d.date_of_shift, p.id AS person_id, shifts.id AS shift_id
  FROM people AS p
  CROSS JOIN generate_series('2016-03-21'::date, '2016-03-25', interval '1 day') AS d(date_of_shift)
  LEFT OUTER JOIN shifts ON shifts.person_id = p.id AND shifts.date_of_shift = d.date_of_shift
  ORDER BY p.id, d.date_of_shift)

SELECT p.id,
       p.name,
       json_agg(row_to_json(s.*)) as shifts
FROM people AS p
JOIN shifts_per_person AS s ON p.id = s.person_id
GROUP BY p.id, p.name