我正在尝试仅使用一个使用行之间日期的查询对一些数据进行分组。让我举个例子:
数据
IDE DATE
------ ----------
AA1111 23-05-2016
AA1111 25-05-2016
AA1111 25-05-2016
AA1111 13-09-2016
AA1111 02-11-2016
AA1111 23-11-2016
AA1111 06-02-2017
AA1111 06-06-2017
AA1111 01-09-2017
AA1111 12-10-2017
AA1111 17-04-2018
AA1111 25-05-2018
AA1111 05-06-2018
我想将差异少于16天的日期分组。我已经使用以下方法计算了日期和下一个日期之间的差额:
SELECT T.IDE,
T.DATE,
MAX(T.DATE) OVER (ORDER BY DATE ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) - T.DATE AS DIF
FROM TESTPAT1 T ;
输出1
IDE DATE DIF
------ ---------- ---
AA1111 23-05-2016 2
AA1111 25-05-2016 0
AA1111 25-05-2016 111
AA1111 13-09-2016 50
AA1111 02-11-2016 21
AA1111 23-11-2016 75
AA1111 06-02-2017 120
AA1111 06-06-2017 87
AA1111 01-09-2017 41
AA1111 12-10-2017 187
AA1111 17-04-2018 38
AA1111 25-05-2018 11
AA1111 05-06-2018 0
在这里,我可以使用行之间的差异,但是16天的窗口是我的问题,因为组中的每个日期都必须从窗口的第一个日期开始在该窗口内。
一些注意事项:日期按升序排序,我的预期输出为:
预期输出
IDE DATE GROUP
AA1111 23-05-2016 1
AA1111 25-05-2016 1
AA1111 25-05-2016 1
AA1111 13-09-2016 2
AA1111 02-11-2016 3
AA1111 23-11-2016 4
AA1111 06-02-2017 5
AA1111 06-06-2017 6
AA1111 01-09-2017 7
AA1111 12-10-2017 8
AA1111 17-04-2018 9
AA1111 25-05-2018 10
AA1111 05-06-2018 10
注意:这不是实际的变量名称
答案 0 :(得分:1)
查看上一行。查看日期差是否大于或等于16天。如果是,它将启动一个新组。然后,组标识符就是这些“起始组”值的总和。
在SQL中:
select t.*,
sum(case when prev_date > date - interval '16' day then 0 else 1 end) over (partition by ide order by date) as grp
from (select t.*,
lag(date) over (partition by ide order by date) as prev_date
from TESTPAT1 T
) t;
注意:这假设您实际上希望每个ide
都有单独的组。如果不是这种情况,则删除partition by
子句。
答案 1 :(得分:1)
这就是所谓的“位拟合”问题。在您的情况下,您正在尝试将数据适合到每个组中,每个组最多可容纳16天的数据。
有几种使用SQL来解决bin拟合问题的著名方法。 MATCH RECOGNIZE
和其中任何一个都一样好:
with test_data (IDE, "DATE") AS (
SELECT 'AA1111', TO_DATE('23-05-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('25-05-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('25-05-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('13-09-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('02-11-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('23-11-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('06-02-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('06-06-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('01-09-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('12-10-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('17-04-2018','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('25-05-2018','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111', TO_DATE('05-06-2018','DD-MM-YYYY') FROM DUAL )
SELECT ide, "DATE", mno as "GROUP"
FROM test_data
match_recognize (
partition by ide
order by "DATE"
measures
match_number() as mno,
"DATE" - FIRST(GRP."DATE") as dif
all rows per match
pattern ( grp* )
define
grp AS "DATE" - FIRST("DATE") < 16
);
+--------+-----------+-------+ | IDE | DATE | GROUP | +--------+-----------+-------+ | AA1111 | 23-MAY-16 | 1 | | AA1111 | 25-MAY-16 | 1 | | AA1111 | 25-MAY-16 | 1 | | AA1111 | 13-SEP-16 | 2 | | AA1111 | 02-NOV-16 | 3 | | AA1111 | 23-NOV-16 | 4 | | AA1111 | 06-FEB-17 | 5 | | AA1111 | 06-JUN-17 | 6 | | AA1111 | 01-SEP-17 | 7 | | AA1111 | 12-OCT-17 | 8 | | AA1111 | 17-APR-18 | 9 | | AA1111 | 25-MAY-18 | 10 | | AA1111 | 05-JUN-18 | 10 | +--------+-----------+-------+
MODEL
子句为11g用户更新此查询应在11g上工作以解决您的垃圾箱拟合问题。与上述结果相同,只是方法不同。
with
-- First, sort the input data because we need to be able to refer
-- to the prior row and `lag` doesn't really work in `MODEL`, afaik.
sorted_inputs ( ide, sort_order, "DATE", first_date_in_group, grp, diff) as
( SELECT ide,
row_number() over ( partition by ide order by "DATE" ) sort_order,
"DATE",
-- These columns are place holders for the MODEL clause to update
CAST(NULL AS DATE) first_date_in_group,
0 grp,
0 diff
FROM test_data )
SELECT ide, "DATE", grp "GROUP"
from sorted_inputs
model
partition by (ide)
dimension by (sort_order)
measures ( "DATE", grp, first_date_in_group, diff )
rules update automatic order
( grp[1] = 1,
first_date_in_group[1] = "DATE"[1],
diff[ANY] = "DATE"[CV()] - first_date_in_group[CV()-1],
grp[sort_order>1] = grp[cv()-1] + CASE WHEN diff[CV()] > 16 THEN 1 ELSE 0 END,
first_date_in_group[sort_order>1] = CASE WHEN diff[CV()] > 16 THEN "DATE"[CV()] ELSE first_date_in_group[CV()-1] END
)