Oracle SQL-按月在两个日期之间选择用户

时间:2019-07-30 21:41:06

标签: sql oracle

我正在学习SQL,我想知道如何根据活跃日期的开始和结束日期(均为timestamp(6))按月选择活跃用户。我的桌子看起来像这样:

Cust_Num | Start_Date | End_Date
       1 | 2018-01-01 | 2019-01-01
       2 | 2018-01-01 | NULL
       3 | 2019-01-01 | 2019-06-01
       4 | 2017-01-01 | 2019-03-01

因此,按月份统计活跃用户,我应该得到如下输出:

As of.     | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1

到目前为止,我通过每月输入一次来进行手动操作:

Select
    201906,
    count(distinct a.cust_num)
From
    active_users a
Where
    to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)

union all

Select
    201905,
    count(distinct a.cust_num)
From
    active_users a
Where
    to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)

union all
...

如果我想在120个月内进入10年,那么优化和可持续性不是很好。

欢迎任何帮助。非常感谢!

2 个答案:

答案 0 :(得分:1)

此查询显示该月底的有效用户数。

工作原理:

  1. 将每个输入行(具有StartDateEndDate值)转换为行,它们代表活动用户计数时的时间点(在StartDate上递增和递减(在EndDate上递减)。我们需要将NULL转换为较远的日期值,因为NULL值是在非NULL值之前而不是之后进行排序的:

    这使您的数据看起来像这样:

    OnThisDate   Change
    2018-01-01        1
    2019-01-01       -1
    2018-01-01        1
    9999-12-31       -1
    2019-01-01        1
    2019-06-01       -1
    2017-01-01        1
    2019-03-01       -1
    
  2. 然后我们简单地SUM OVER Change值(排序后)以获取该特定日期的活动用户数:

    因此,首先,按OnThisDate排序:

    OnThisDate   Change
    2017-01-01        1
    2018-01-01        1
    2018-01-01        1
    2019-01-01        1
    2019-01-01       -1
    2019-03-01       -1
    2019-06-01       -1
    9999-12-31       -1
    

    然后SUM OVER

    OnThisDate   ActiveCount
    2017-01-01             1
    2018-01-01             2
    2018-01-01             3
    2019-01-01             4
    2019-01-01             3
    2019-03-01             2
    2019-06-01             1
    9999-12-31             0
    
  3. 然后我们按月PARTITION(而不是分组!)中的行,并按日期对它们进行排序,这样我们就可以确定该月的最后ActiveCount行(实际上发生在{最外层查询的{1}},每个月WHERE使用ROW_NUMBER()COUNT()):

    PARTITION
  4. 然后在OnThisDate ActiveCount IsLastInMonth 2017-01-01 1 1 2018-01-01 2 0 2018-01-01 3 1 2019-01-01 4 0 2019-01-01 3 1 2019-03-01 2 1 2019-06-01 1 1 9999-12-31 0 1 (实际上,每个IsLastInMonth = 1内部的ROW_COUNT() = COUNT(*))的位置进行过滤,以提供最终的输出数据:

    PARTITION

这确实会导致结果集中出现“空白”,因为At-end-of-month Active-count 2017-01 1 2018-01 3 2019-01 3 2019-03 2 2019-06 1 9999-12 0 列仅显示At-end-of-month值实际更改的行,而不包括所有可能的日历月份-但这是理想的(到目前为止)就我而言),因为它排除了冗余数据。只需在每隔一个月重复输出行,直到达到下一个Active-count值,即可在应用程序代码内填补空白。

这是在SQL Server上使用T-SQL的查询(我现在无法访问Oracle)。这是我用来解决问题的SQLFiddle:http://sqlfiddle.com/#!18/ad68b7/24

At-end-of-month

可以直接使用聚合和窗口函数而不是使用别名(例如SELECT OtdYear, OtdMonth, ActiveCount FROM ( -- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth ) SELECT OnThisDate, OtdYear, OtdMonth, ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth, COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth, ActiveCount FROM ( SELECT OnThisDate, YEAR( OnThisDate ) AS OtdYear, MONTH( OnThisDate ) AS OtdMonth, SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount FROM ( SELECT StartDate AS [OnThisDate], 1 AS [Change] FROM tbl UNION ALL SELECT ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate], -1 AS [Change] FROM tbl ) AS sq1 ) AS sq2 ) AS sq3 WHERE RowInMonth = RowsInMonth ORDER BY OtdYear, OtdMonth OtdYear等)将此查询 扁平化为更少的嵌套查询,但这会该查询更难理解。

答案 1 :(得分:1)

我创建了查询,该查询将给出从表中的最小开始日期到最大结束日期的所有月份的结果。

您可以通过在WHERE子句中添加一个条件来更改它。

-- table creation

CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)

-- data creation

INSERT INTO ACTIVE_USERS
SELECT * FROM 
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL              FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)

-- data in the actual table

SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;

  CUST_NUM START_DATE END_DATE  
---------- ---------- ----------
         1 2018-01-01 2019-01-01
         2 2018-01-01           
         3 2019-01-01 2019-06-01
         4 2017-01-01 2019-03-01

查询以获取所需结果

WITH CTE ( START_DATE, END_DATE ) AS 
(
    SELECT
        ADD_MONTHS( START_DATE, LEVEL - 1 ),
        ADD_MONTHS( START_DATE, LEVEL ) - 1
    FROM
    (
        SELECT
            MIN( START_DATE ) AS START_DATE,
            MAX( END_DATE   ) AS END_DATE
        FROM
            ACTIVE_USERS
    )
    CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
) 
--
--
SELECT    
    C.START_DATE,
    COUNT(1) AS CNT
FROM
    CTE C
    JOIN ACTIVE_USERS D ON
    (
        C.END_DATE BETWEEN
            D.START_DATE
            AND
            CASE
                WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
                ELSE C.END_DATE
            END
    )
GROUP BY
    C.START_DATE
ORDER BY
    C.START_DATE;

-输出-

START_DATE        CNT
---------- ----------
2017-01-01          1
2017-02-01          1
2017-03-01          1
2017-04-01          1
2017-05-01          1
2017-06-01          1
2017-07-01          1
2017-08-01          1
2017-09-01          1
2017-10-01          1
2017-11-01          1

START_DATE        CNT
---------- ----------
2017-12-01          1
2018-01-01          3
2018-02-01          3
2018-03-01          3
2018-04-01          3
2018-05-01          3
2018-06-01          3
2018-07-01          3
2018-08-01          3
2018-09-01          3
2018-10-01          3

START_DATE        CNT
---------- ----------
2018-11-01          3
2018-12-01          3
2019-01-01          3
2019-02-01          3
2019-03-01          2
2019-04-01          2
2019-05-01          2
2019-06-01          1

30 rows selected. 

干杯!