我正在学习SQL,我想知道如何根据活跃日期的开始和结束日期(均为timestamp(6)
)按月选择活跃用户。我的桌子看起来像这样:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
因此,按月份统计活跃用户,我应该得到如下输出:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
到目前为止,我通过每月输入一次来进行手动操作:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
如果我想在120个月内进入10年,那么优化和可持续性不是很好。
欢迎任何帮助。非常感谢!
答案 0 :(得分:1)
此查询显示该月底的有效用户数。
工作原理:
将每个输入行(具有StartDate
和EndDate
值)转换为两行,它们代表活动用户计数时的时间点(在StartDate
上递增和递减(在EndDate
上递减)。我们需要将NULL
转换为较远的日期值,因为NULL
值是在非NULL
值之前而不是之后进行排序的:
这使您的数据看起来像这样:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
然后我们简单地SUM OVER
Change
值(排序后)以获取该特定日期的活动用户数:
因此,首先,按OnThisDate
排序:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
然后SUM OVER
:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
然后我们按月PARTITION
(而不是分组!)中的行,并按日期对它们进行排序,这样我们就可以确定该月的最后ActiveCount
行(实际上发生在{最外层查询的{1}},每个月WHERE
使用ROW_NUMBER()
和COUNT()
):
PARTITION
然后在OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
(实际上,每个IsLastInMonth = 1
内部的ROW_COUNT() = COUNT(*)
)的位置进行过滤,以提供最终的输出数据:
PARTITION
这确实会导致结果集中出现“空白”,因为At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
列仅显示At-end-of-month
值实际更改的行,而不包括所有可能的日历月份-但这是理想的(到目前为止)就我而言),因为它排除了冗余数据。只需在每隔一个月重复输出行,直到达到下一个Active-count
值,即可在应用程序代码内填补空白。
这是在SQL Server上使用T-SQL的查询(我现在无法访问Oracle)。这是我用来解决问题的SQLFiddle:http://sqlfiddle.com/#!18/ad68b7/24
At-end-of-month
可以直接使用聚合和窗口函数而不是使用别名(例如SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
,OtdYear
等)将此查询 扁平化为更少的嵌套查询,但这会该查询更难理解。
答案 1 :(得分:1)
我创建了查询,该查询将给出从表中的最小开始日期到最大结束日期的所有月份的结果。
您可以通过在WHERE子句中添加一个条件来更改它。
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
查询以获取所需结果
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-输出-
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
干杯!