如何使用PostgreSQL计算重叠日期范围持续时间(大于2)?

时间:2017-02-23 09:45:02

标签: sql ruby-on-rails database postgresql

我正在尝试计算许多日期范围的全局持续时间。

在我的数据库中,我有候选人和经验。

候选人可能有很多经历,而且经验有开始日期,可能有一个end_date。

经验日期范围可以重叠,这里我卡住了,如何计算持续时间?

这就是我加入模特的方式:

My models

我想通过查询经验和技巧来检索候选人列表: 我有2个输入,范围和技能名称。例如,我希望所有具有“Ruby”技能的候选人通过经验和经验获得全球5年的经验。

修改

目前的解决方案:

      SELECT * FROM (
            WITH cte AS (
                SELECT
                  experiences.candidate_id AS candidate_id,
                  experiences.id AS e_id,
                  experiences.start_at AS start_at,
                  experiences.end_at AS end_at,
                  LAG(experiences.start_at, 1, start_at)
                  OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS prev_start_at,
                  LAG(experiences.end_at, 1, start_at)
                  OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS prev_end_at,
                  LEAD(experiences.start_at)
                  OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS next_start_at,
                  LEAD(experiences.end_at, 1, current_date)
                  OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS next_end_at
                FROM experiences
                  INNER JOIN experiences_skills ON experiences_skills.experience_id = experiences.id
                  INNER JOIN skills ON skills.id = experiences_skills.skill_id
                WHERE skills.name = 'Ruby'
            )
            SELECT
              SUM(CASE
                  WHEN (cte.prev_end_at > cte.end_at AND cte.prev_end_at < cte.next_start_at)
                    THEN cte.prev_end_at
                  WHEN (cte.prev_end_at > cte.end_at AND cte.prev_end_at > cte.next_start_at)
                    THEN cte.next_start_at
                  WHEN cte.end_at > cte.next_start_at
                    THEN cte.next_start_at
                  ELSE cte.end_at
                  END
                  -
                  cte.start_at
              ) AS duration_day,
              candidates.*
            FROM cte
              INNER JOIN candidates ON candidates.id = cte.candidate_id
            GROUP BY candidates.id
          ) AS candidates
      WHERE duration_day > 0 AND duration_day < 1000';

1 个答案:

答案 0 :(得分:1)

我不认为你可以在没有defining your own aggregate的情况下解决这个问题,例如:

CREATE OR REPLACE FUNCTION range_array_merge(s anyarray, v anynonarray)
    RETURNS anyarray
    LANGUAGE SQL
    IMMUTABLE
AS $func$
  WITH RECURSIVE arrays(r) AS (
      SELECT s || v
    UNION ALL
      SELECT array_agg(DISTINCT u)
      FROM   (SELECT a + b u
              FROM   arrays,
                     unnest(r) a
              JOIN   unnest(r) b ON a <> b AND a && b) u
      HAVING COUNT(u) > 0
  ),
  ranges(r) AS (
      SELECT unnest(r)
      FROM   arrays
  )
  SELECT    array_agg(DISTINCT r.r)
  FROM      ranges r
  LEFT JOIN ranges c ON c.r <> r.r AND c.r @> r.r
  WHERE     c.r IS NULL
$func$;

CREATE AGGREGATE range_array_merge_agg(anynonarray) (
  STYPE    = anyarray,
  SFUNC    = range_array_merge(anyarray, anynonarray),
  INITCOND = '{}'
);

这可以收集任何类型的范围&amp;将它们累积到一个单独的数组中,该数组只有单独的范围(重叠的范围是联合在一起的)。

有了这个,您的查询就像“简单”一样:

SELECT   e.candidate_id,
         SUM(upper(r.r) - lower(r.r) + 1) total_days
FROM     (SELECT   e.candidate_id,
                   range_array_merge_agg(daterange(e.start_at, COALESCE(e.end_at, current_date))) r
          FROM     experiences e
          JOIN     experiences_skills es ON es.experience_id = e.id
          WHERE    es.skill_id = 42 --> search for a specific skill
          GROUP BY e.candidate_id) e,
         unnest(e.r) r
GROUP BY e.candidate_id;

假设start_atend_at的类型为date。使用timestamp [with time zone]类型,事情会变得混乱,但我怀疑你是否需要这种精确度。

您可以使用HAVING SUM(upper(r.r) - lower(r.r) + 1) > 1000过滤上述查询。

http://rextester.com/DNSWS30622

修改:搜索完整的候选行:

SELECT   c.*
FROM     (SELECT   e.candidate_id,
                   range_array_merge_agg(daterange(e.start_at, COALESCE(e.end_at, current_date))) r
          FROM     experiences e
          JOIN     experiences_skills es ON es.experience_id = e.id
          WHERE    es.skill_id = 42 --> search for a specific skill
          GROUP BY e.candidate_id) e,
         unnest(e.r) r
JOIN     candidates c ON e.candidate_id = c.id
GROUP BY c.id
HAVING   SUM(upper(r.r) - lower(r.r) + 1) > 1000; --> search for minimum number of total days

注意:如果你想知道为什么总和为+ 1,原因很简单。 date '2017-01-01' - date '2017-01-01'0(零)。虽然我认为这是 1天的经验(那一天将是2017-01-01)。这就是为什么总和需要+ 1。您也可以在daterange构造函数中表达这一点,例如:daterange(e.start_at, COALESCE(e.end_at, current_date)), '[]')。但由于date是离散的,因此[2017-01-01,2017-01-02]范围将被规范化为[2017-01-01,2017-01-03)形式。这样就不需要将1添加到总和中,因为规范化已经“扩展”了它的upper()边界。