在分层数据中计算具有约束的行

时间:2014-07-01 01:07:57

标签: sql oracle hierarchical-data gaps-and-islands

我有分层数据,使用DATE_FROMDATE_TO链接实体的实例。

请参阅sqlfiddle

使用CONNECT_BY我可以确定每个实体的连续实例的数量,即"岛的长度",这主要是我想要的。例如,这给出了2014年DATE_FROM每个实体的预期岛屿长度:

-- QUERY 1
SELECT 
  T.ENTITY_ID,
  MAX(LEVEL) MAX_LEVEL
FROM TEST T
WHERE EXTRACT(YEAR FROM T.DATE_FROM) = 2014
CONNECT BY 
  T.ENTITY_ID = PRIOR T.ENTITY_ID
  AND T.DATE_FROM = PRIOR T.DATE_TO
GROUP BY T.ENTITY_ID

但是,我想要做的是计算DATE_FROMDATE_TO跨越最小天数的岛屿中的行数。当我这样做时,我不想打破岛屿的等级。

所以我试过这个,但这是错的。结果并不总是我想要的。

-- QUERY 2
SELECT 
  T.ENTITY_ID,
  MAX(LEVEL) MAX_LEVEL,
  SUM(
    CASE WHEN PRIOR T.DATE_TO - PRIOR T.DATE_FROM > 183 
    THEN 1 
    ELSE 0 
    END
  ) LONG_TERM_COUNT
FROM TEST T
WHERE EXTRACT(YEAR FROM T.DATE_FROM) = 2014
CONNECT BY 
  T.ENTITY_ID = PRIOR T.ENTITY_ID
  AND T.DATE_FROM = PRIOR T.DATE_TO
GROUP BY T.ENTITY_ID

哪个给出了

+-----------+-----------+-----------------+
| ENTITY_ID | MAX_LEVEL | LONG_TERM_COUNT |
+-----------+-----------+-----------------+
|         1 |         4 |               3 |
|         2 |         5 |               4 |
+-----------+-----------+-----------------+

但我正在寻找

+-----------+-----------+-----------------+
| ENTITY_ID | MAX_LEVEL | LONG_TERM_COUNT |
+-----------+-----------+-----------------+
|         1 |         4 |               4 |
|         2 |         5 |               4 |
+-----------+-----------+-----------------+

我需要一个Oracle解决方案。谢谢你的阅读。

3 个答案:

答案 0 :(得分:1)

在CONNECT BY之后评估WHERE条件,因此您的查询不会从2014年的行开始。它为表中的每一行创建层次结构,当您删除WHERE时,您可以轻松看到聚合:

SELECT 
  T.ENTITY_ID,
  LEVEL,
  T.DATE_TO,  
  T.DATE_FROM,
  prior T.DATE_TO,
  prior T.DATE_FROM
FROM TEST T
CONNECT BY 
  T.ENTITY_ID = PRIOR T.ENTITY_ID
  AND T.DATE_TO = PRIOR T.DATE_FROM
order by 1,2

您需要使用START WITH而不是WHERE条件:

SELECT 
  T.ENTITY_ID,
  LEVEL,
  T.DATE_TO,  
  T.DATE_FROM,
  prior T.DATE_TO,
  prior T.DATE_FROM
FROM TEST T
START WITH EXTRACT(YEAR FROM T.DATE_FROM) = 2014
CONNECT BY 
   T.ENTITY_ID = PRIOR T.ENTITY_ID
   AND T.DATE_TO = PRIOR T.DATE_FROM

所以最后它:

SELECT 
  T.ENTITY_ID,
  MAX(LEVEL) MAX_LEVEL, -- or COUNT(*)
  SUM(
    CASE WHEN  T.DATE_TO -  T.DATE_FROM > 183 
    THEN 1 
    ELSE 0 
    END
  ) LONG_TERM_COUNT
FROM TEST T
CONNECT BY 
  T.ENTITY_ID = PRIOR T.ENTITY_ID
  AND T.DATE_TO = PRIOR T.DATE_FROM
START WITH EXTRACT(YEAR FROM T.DATE_FROM) = 2014
GROUP BY T.ENTITY_ID

如果2014年有两行,你可能会得到错误的结果,所以你需要从2014年的最新一行开始:

SELECT 
  T.ENTITY_ID,
  MAX(LEVEL) MAX_LEVEL,
  SUM(
    CASE WHEN  T.DATE_TO -  T.DATE_FROM > 183 
    THEN 1 
    ELSE 0 
    END
  ) LONG_TERM_COUNT
FROM TEST T
CONNECT BY 
  T.ENTITY_ID = PRIOR T.ENTITY_ID
  AND T.DATE_TO = PRIOR T.DATE_FROM
START WITH T.DATE_FROM = 
  (
    SELECT MAX(T2.DATE_FROM) 
    FROM TEST T2 
    WHERE T.ENTITY_ID = T2.ENTITY_ID
      AND T2.DATE_FROM >= DATE '2014-01-01'
      AND T2.DATE_FROM <= DATE '2014-12-31'
  )
GROUP BY T.ENTITY_ID

Fiddle

答案 1 :(得分:0)

你的sql语句是正确的。但是,CASE WHEN T.DATE_TO - PRIOR T.DATE_FROM > 183语句变为null时需要考虑的一种情况不会被计算在内。

INSERT INTO TEST 
 VALUES (1,TO_DATE('20130101','YYYYMMDD'),TO_DATE('20140101','YYYYMMDD'));
INSERT INTO TEST 
 VALUES (1,TO_DATE('20140101','YYYYMMDD'),TO_DATE('20150101','YYYYMMDD'));

从您的数据示例中,等效案例

CASE WHEN 
      TO_DATE('20140101','YYYYMMDD') - PRIOR TO_DATE('20140101','YYYYMMDD') > 183

这会给出null值;

答案 2 :(得分:0)

我不熟悉Oracle,但一个好方法可能是使用RANK聚合。 例如:

SELECT 
 T.ENTITY_ID,
 T.DATE_FROM,
 RANK() OVER (PARTITION BY ENTITY_ID
 ORDER BY T.DATE_TO DESC) "Rank"
FROM TEST T
WHERE EXTRACT(YEAR FROM T.DATE_FROM) <= 2014 

加入T.ENTITY_ID = Prior T.ENTITY_IDRank = (PRIOR.Rank + 1)可能会导致解决方案。正如我所说,这只是一个如何接近的建议。

我尝试了一点,这是我使用SubQuery SQL Fiddle

的解决方案
SELECT 
 T.ENTITY_ID,
 MAX(LEVEL) MAX_LEVEL,
 (Select MAX("Rank") FROM
  (
    SELECT T2.ENTITY_ID AS ID, RANK() OVER (PARTITION BY T2.ENTITY_ID
    ORDER BY T2.DATE_TO DESC) "Rank"
    FROM TEST T2
    WHERE EXTRACT(YEAR FROM T2.DATE_FROM) < 2014 
  ) SubQ
  WHERE ID = T.ENTITY_ID
 ) "LONG_TERM_COUNT"
FROM TEST T
WHERE EXTRACT(YEAR FROM T.DATE_FROM) = 2014
CONNECT BY 
  T.ENTITY_ID = PRIOR T.ENTITY_ID
  AND T.DATE_FROM = PRIOR T.DATE_TO
GROUP BY T.ENTITY_ID