Question

我正在为一个研究项目准备一些数据，并且遇到以下挑战。如果可能的话，我想使用SQL或PL SQL进行所有操作（尽管我是PL的真正新手）。

假设我们有下表（请注意，Period_ID是我要创建的所需行）：

+-------+-----------+--------------+--------------+-----------+
| Row # | Person_ID |     Code     |     Date     | Period_ID |
+-------+-----------+--------------+--------------+-----------+
|     1 |         1 | Start_period | Jan 1st      |         1 |
|     2 |         1 | End_period   | Jan 15th     |         1 |
|     3 |         1 | Random_code1 | Feb 15th     |         1 |
|     4 |         1 | Random_code2 | Feb 28th     |         1 |
|     5 |         1 | End_period   | March 31st   |         1 |
|     6 |         1 | Start_period | May 31st     |         2 |
|     7 |         1 | End_period   | June 11th    |         2 |
|     8 |         1 | End_period   | October 28th |         2 |
+-------+-----------+--------------+--------------+-----------+

专栏和挑战：

Person_ID ：以上数据仅适用于一个人（该数据属于交易级别）。
代码：此代码可以是Start_period，End_period或任何随机代码。每个Start_period代码应具有一个对应的End_period代码。 此问题的挑战是识别所有“开始/结束”对以创建“ Period_ID”列。此挑战的一个重要细微差别：End_period代码为 INVALID （如果在此范围内） Start_period代码的28天。例如，第2行中的End_period代码无效，因为它位于1月15日，即1月1日之后的14天。相反，有效的End_period代码位于第5行，因为它已超过28天。
日期：交易日期
Period_ID ：所需的行-该信息当前不在表中。

Answer 1

只需计算每一行的开始时间段数：

select t.*,
       sum(case when code = 'Start_period' then 1 else 0 end) over (partition by person_id order by date) as period_id
from t;

这适用于您提供的数据。它没有正式纳入其他规则，例如结束时间间隔。

Answer 2

以下是使用始终有趣的Match_Recognize的答案。请注意，您实际上不应为“代码”或“日期”列命名，因为它们是保留关键字。

Match_Recognize对多行进行操作，并尝试匹配给定的模式。在您的情况下，您尝试匹配一个起始代码的模式，然后匹配零个或多个无效的结束代码/其他代码，再匹配一个有效的结束代码。

WITH test_vals AS (
    SELECT 1 as person_ID,'Start_period' as my_code,to_date('Jan 1','mon dd') as my_date FROM DUAL
    UNION ALL SELECT 1,'End_period',to_date('Jan 15','mon dd') FROM DUAL
    UNION ALL SELECT 1,'Random_code1',to_date('Feb 15','mon dd') FROM DUAL
    UNION ALL SELECT 1,'Random_code2',to_date('Feb 28','mon dd') FROM DUAL
    UNION ALL SELECT 1,'End_period',to_date('March 31','mon dd') FROM DUAL
    UNION ALL SELECT 1,'Start_period',to_date('May 31','mon dd') FROM DUAL
    UNION ALL SELECT 1,'End_period',to_date('June 11','mon dd') FROM DUAL
    UNION ALL SELECT 1,'End_period',to_date('October 28','mon dd') FROM DUAL
)

SELECT m.person_id,
       m.my_code,
       m.my_date,
       m.period_id
FROM test_vals t
match_recognize(
    PARTITION BY person_id
    ORDER BY my_date
    MEASURES
        match_number() AS period_id /* Return the match number as the period ID */
    ALL ROWS PER match
    pattern (
        start_code /* Match a single start code */
        (invalid_end_code | other_code)* /* Match zero or more invalid end codes or other codes */
        valid_end_code /* Match a single end code */
    )
    define
        start_code AS my_code = 'Start_period', /* Start codes are always valid */
        valid_end_code AS my_code = 'End_period' AND (my_date - FIRST(my_date)) > 28, /* End codes are only valid if they come more than 28 days after the start of the pattern match */
        invalid_end_code AS my_code = 'End_period' AND (my_date - FIRST(my_date)) <= 28,
        other_code AS my_code NOT IN ('Start_period', 'End_period')
) m

Answer 3

我将使用递归cte，例如：

with c(row_, code, date_, st_date, period, chg) as (
    select row_, code, date_, date_, 1, 0 from t where row_ = 1
    union all
    select t.row_, t.code, t.date_, 
           case when chg = 1 then t.date_ else st_date end, 
           case when chg = 1 then period + 1 else period end, 
           case when t.code = 'End_period' and t.date_ - c.st_date > 28 then 1 else 0 end
      from t join c on t.row_ = c.row_ + 1
    )
select row_, code, date_, period from c

^{dbfiddle demo}

逻辑是使用chg列来引导周期更改。当代码为Chg且日期大于以前记忆的开始日期时，End period设置为1。在下一步中，时间段增加，chg重置为零，并设置了新的开始日期。

使用SQL识别带有开始和结束日期的时间段

3 个答案: