从开始日期到结束日期创建时间间隔,并代表所有值-以逗号分隔

时间:2019-07-19 08:33:20

标签: sql oracle plsql

我有下表

CREATE TABLE PERSONS (
    PERSON_UID    NUMBER PRIMARY KEY,
    PERSON_NAME   VARCHAR2(100)
);

CREATE TABLE SKILLS (
    SKILL_UID    NUMBER PRIMARY KEY,
    SKILL_NAME   VARCHAR2(100)
);

CREATE TABLE PERSON_SKILLS (
    PERSON_SKILLS_UID   NUMBER,
    PERSON_FK           NUMBER,
    SKILL_FK            NUMBER,
    VALID_START         DATE,
    VAID_END            DATE
);

表格数据:

PERSONS表数据

PERSON_UID | PERSON_NAME
---------: | :----------
         1 | P1         
         2 | P2         
         3 | P3         

技能表数据

SKILL_UID | SKILL_NAME
--------: | :---------
        1 | SKILL1    
        2 | SKILL2    
        3 | SKILL3    
        4 | SKILL4    
        5 | SKILL5    
        6 | SKILL6    
        7 | SKILL7    
        8 | SKILL8    
        9 | SKILL9    
       10 | SKILL10   

PERSON_SKILLS表数据

PERSON_SKILLS_UID | PERSON_FK | SKILL_FK | VALID_START | VAID_END   
----------------: | --------: | -------: | :---------- | :----------
                1 |         1 |        1 | 01-JAN-1990 | null       
                2 |         1 |        2 | 01-JAN-1990 | 25-SEP-2001
                4 |         1 |        6 | 01-JAN-1990 | 01-JAN-2010
                5 |         1 |        7 | 01-JAN-1990 | null       
                3 |         1 |        3 | 01-JUL-1990 | null       
                6 |         1 |        9 | 31-DEC-2018 | null       
                7 |         2 |        2 | 01-JAN-1990 | null       
                9 |         2 |        8 | 01-JAN-1990 | 01-JAN-2001
                8 |         2 |        3 | 01-JAN-1995 | 20-OCT-1998
               10 |         3 |        9 | 01-JAN-1990 | null       
               11 |         3 |        4 | 01-JAN-1990 | null       
               12 |         3 |        5 | 01-JAN-1991 | null       
               13 |         3 |        7 | 01-JAN-2005 | null       

表PERSON_SKILLS包含具有有效开始日期和有效结束日期的人员的个人技能。 (有效结束日期为NULL,表示该技能当前处于活动状态)

我想使用开始/结束日期以及与该员工间隔有关的所有技能(以逗号分隔)来创建日期间隔。

让我们以第二个人为例:(我需要在单个查询中为所有员工提供输出)

预期产量

PERSON_NAME | VALID_START | VALID_END   | SKILLS_OF_EMP         
:---------- | :---------- | :---------- | :---------------------
P2          | 01-JAN-1990 | 31-DEC-1994 | SKILL2, SKILL8        
P2          | 01-JAN-1995 | 20-OCT-1998 | SKILL2, SKILL3, SKILL8
P2          | 21-OCT-1998 | 01-JAN-2001 | SKILL2, SKILL8        
P2          | 02-JAN-2001 | 31-DEC-4712 | SKILL2                

我已经用所有表DDL,数据以及预期的输出创建了db<>fiddle

希望找到性能更快的查询,因为我大约有18000人,平均每人具有15-16技能。

注意:4712年12月31日是时间的结束。

2 个答案:

答案 0 :(得分:1)

with ranges as (
  select per, dt d1, nvl(lead(dt)  over (partition by per order by dt) - 1, date '4712-12-31') d2
    from (select person_fk per, valid_start dt from person_skills union 
          select person_fk, vaid_end from person_skills)  
    where dt is not null)
select per, d1, d2 , listagg(skill_name, ', ') within group (order by d1) list
  from person_skills ps
  join ranges r on (d1<vaid_end or vaid_end is null) and valid_start <= d2 and ps.person_fk = per 
  join persons p on per = p.person_uid
  join skills s on s.skill_uid = ps.skill_fk
  where d1 is not null
  group by per, d1, d2

dbfiddle

主要问题是为每个人创建时间范围。我为每个人合并了date_start和date_end(不是union all,因为我们需要不同的值)。在lead()中对这些日期进行了排序以创建期间。

这种准备好的表可以用典型的方式与您的数据连接,聚合并listagg()完成工作。

答案 1 :(得分:1)

使用UNPIVOT INCLUDE NULLS将日期范围的开始和结束分别放在不同的行中,然后使用LEAD分析函数为每个人查找连续的边界日期,然后可以重新加入主表和聚合。

查询

SELECT p.person_name,
       r.range_start AS valid_start,
       r.range_end AS valid_end,
       LISTAGG( s.skill_name, ',' ) WITHIN GROUP ( ORDER BY s.skill_name ) AS skills_of_emp
FROM   (
  SELECT PERSON_FK,
         date_time AS range_start,
         LEAD( date_time ) OVER ( PARTITION BY PERSON_FK ORDER BY date_time )
           AS range_end
  FROM   (
    SELECT DISTINCT
           PERSON_FK,
           COALESCE( date_time, DATE '4712-12-31' ) AS date_time
    FROM   person_skills
    UNPIVOT INCLUDE NULLS ( date_time FOR value IN ( valid_start AS 1, valid_end AS -1 ) )
  )
) r
INNER JOIN Person_skills ps
ON (   ps.valid_start <= r.range_start
   AND r.range_end   <= COALESCE( ps.valid_end, DATE '4712-12-31' )
   AND ps.person_fk   = r.person_fk )
INNER JOIN skills s
ON ( ps.skill_fk = s.skill_uid )
INNER JOIN people p
ON ( ps.person_fk = p.person_uid )
GROUP BY r.person_fk,
         p.person_name,
         r.range_start,
         r.range_end

输出

PERSON_NAME | VALID_START | VALID_END  | SKILLS_OF_EMP                     
:---------- | :---------- | :--------- | :---------------------------------
P1          | 1990-01-01  | 1990-07-01 | SKILL1,SKILL2,SKILL6,SKILL7       
P1          | 1990-07-01  | 2001-09-25 | SKILL1,SKILL2,SKILL3,SKILL6,SKILL7
P1          | 2001-09-25  | 2010-01-01 | SKILL1,SKILL3,SKILL6,SKILL7       
P1          | 2010-01-01  | 2018-12-31 | SKILL1,SKILL3,SKILL7              
P1          | 2018-12-31  | 4712-12-31 | SKILL1,SKILL3,SKILL7,SKILL9       
P2          | 1990-01-01  | 1995-01-01 | SKILL2,SKILL8                     
P2          | 1995-01-01  | 1998-10-20 | SKILL2,SKILL3,SKILL8              
P2          | 1998-10-20  | 2001-01-01 | SKILL2,SKILL8                     
P2          | 2001-01-01  | 4712-12-31 | SKILL2                            
P3          | 1990-01-01  | 1991-01-01 | SKILL4,SKILL9                     
P3          | 1991-01-01  | 2005-01-01 | SKILL4,SKILL5,SKILL9              
P3          | 2005-01-01  | 4712-12-31 | SKILL4,SKILL5,SKILL7,SKILL9       

db <>提琴here