我正在尝试计算许多日期范围的全局持续时间。
在我的数据库中,我有候选人和经验。
候选人可能有很多经历,而且经验有开始日期,可能有一个end_date。
经验日期范围可以重叠,这里我卡住了,如何计算持续时间?
这就是我加入模特的方式:
我想通过查询经验和技巧来检索候选人列表: 我有2个输入,范围和技能名称。例如,我希望所有具有“Ruby”技能的候选人通过经验和经验获得全球5年的经验。
目前的解决方案:
SELECT * FROM (
WITH cte AS (
SELECT
experiences.candidate_id AS candidate_id,
experiences.id AS e_id,
experiences.start_at AS start_at,
experiences.end_at AS end_at,
LAG(experiences.start_at, 1, start_at)
OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS prev_start_at,
LAG(experiences.end_at, 1, start_at)
OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS prev_end_at,
LEAD(experiences.start_at)
OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS next_start_at,
LEAD(experiences.end_at, 1, current_date)
OVER (PARTITION BY experiences.candidate_id ORDER BY experiences.start_at) AS next_end_at
FROM experiences
INNER JOIN experiences_skills ON experiences_skills.experience_id = experiences.id
INNER JOIN skills ON skills.id = experiences_skills.skill_id
WHERE skills.name = 'Ruby'
)
SELECT
SUM(CASE
WHEN (cte.prev_end_at > cte.end_at AND cte.prev_end_at < cte.next_start_at)
THEN cte.prev_end_at
WHEN (cte.prev_end_at > cte.end_at AND cte.prev_end_at > cte.next_start_at)
THEN cte.next_start_at
WHEN cte.end_at > cte.next_start_at
THEN cte.next_start_at
ELSE cte.end_at
END
-
cte.start_at
) AS duration_day,
candidates.*
FROM cte
INNER JOIN candidates ON candidates.id = cte.candidate_id
GROUP BY candidates.id
) AS candidates
WHERE duration_day > 0 AND duration_day < 1000';
答案 0 :(得分:1)
我不认为你可以在没有defining your own aggregate的情况下解决这个问题,例如:
CREATE OR REPLACE FUNCTION range_array_merge(s anyarray, v anynonarray)
RETURNS anyarray
LANGUAGE SQL
IMMUTABLE
AS $func$
WITH RECURSIVE arrays(r) AS (
SELECT s || v
UNION ALL
SELECT array_agg(DISTINCT u)
FROM (SELECT a + b u
FROM arrays,
unnest(r) a
JOIN unnest(r) b ON a <> b AND a && b) u
HAVING COUNT(u) > 0
),
ranges(r) AS (
SELECT unnest(r)
FROM arrays
)
SELECT array_agg(DISTINCT r.r)
FROM ranges r
LEFT JOIN ranges c ON c.r <> r.r AND c.r @> r.r
WHERE c.r IS NULL
$func$;
CREATE AGGREGATE range_array_merge_agg(anynonarray) (
STYPE = anyarray,
SFUNC = range_array_merge(anyarray, anynonarray),
INITCOND = '{}'
);
这可以收集任何类型的范围&amp;将它们累积到一个单独的数组中,该数组只有单独的范围(重叠的范围是联合在一起的)。
有了这个,您的查询就像“简单”一样:
SELECT e.candidate_id,
SUM(upper(r.r) - lower(r.r) + 1) total_days
FROM (SELECT e.candidate_id,
range_array_merge_agg(daterange(e.start_at, COALESCE(e.end_at, current_date))) r
FROM experiences e
JOIN experiences_skills es ON es.experience_id = e.id
WHERE es.skill_id = 42 --> search for a specific skill
GROUP BY e.candidate_id) e,
unnest(e.r) r
GROUP BY e.candidate_id;
假设start_at
和end_at
的类型为date
。使用timestamp [with time zone]
类型,事情会变得混乱,但我怀疑你是否需要这种精确度。
您可以使用HAVING SUM(upper(r.r) - lower(r.r) + 1) > 1000
过滤上述查询。
http://rextester.com/DNSWS30622
修改:搜索完整的候选行:
SELECT c.*
FROM (SELECT e.candidate_id,
range_array_merge_agg(daterange(e.start_at, COALESCE(e.end_at, current_date))) r
FROM experiences e
JOIN experiences_skills es ON es.experience_id = e.id
WHERE es.skill_id = 42 --> search for a specific skill
GROUP BY e.candidate_id) e,
unnest(e.r) r
JOIN candidates c ON e.candidate_id = c.id
GROUP BY c.id
HAVING SUM(upper(r.r) - lower(r.r) + 1) > 1000; --> search for minimum number of total days
注意:如果你想知道为什么总和为+ 1
,原因很简单。 date '2017-01-01' - date '2017-01-01'
为0
(零)。虽然我认为这是 1天的经验(那一天将是2017-01-01
)。这就是为什么总和需要+ 1
。您也可以在daterange
构造函数中表达这一点,例如:daterange(e.start_at, COALESCE(e.end_at, current_date)), '[]')
。但由于date
是离散的,因此[2017-01-01,2017-01-02]
范围将被规范化为[2017-01-01,2017-01-03)
形式。这样就不需要将1
添加到总和中,因为规范化已经“扩展”了它的upper()
边界。