我们的数据仓库中有两个表:c_customers和h_customers,包含当前和历史客户记录。
两个表都有一个'DWH_FROM'和'DWH_TO'列,c_customers中的所有记录都有'DWH_TO'= null。
c_customers的PK是CUST_NR,而对于h_customers,它是CUST_NR,DWH_FROM和DWH_TO。
当客户数据发生变化时,新记录将插入到具有空白DWH_TO值的c_customers中,而旧记录将移至带有DWH_TO的h_customers,其中包含更改发生的日期。
我如何获得2016年每个月的第一天(或2016年的每个日期)有多少客户(不同的CUST_NR)有STATUS ='有效'的清单?
理想输出将是这样的:
Date | Count
-----------+------
01.01.2016 | 22385
01.02.2016 | 23187
... |
01.12.2016 | 25109
我已经生成了一个数据集:
SELECT *
FROM (SELECT CUST_NR,
STATUS,
DWH_FROM,
DWH_TO
FROM C_CUSTOMER C
UNION ALL
SELECT CUST_NR,
STATUS,
DWH_FROM,
DWH_TO
FROM H_CUSTOMER H
);
...但我不确定如何计算客户的特定日期,多个日期。
答案 0 :(得分:0)
这是一种蛮力方法。您可以通过执行以下操作来完成一个日期:
select c.cnt + h.cnt
from (select count(*) as cnt
from c_customer c
where date '2016-01-01' <= c.dw_to
) c cross join
(select count(*) as cnt
from h_customer c
where date '2016-01-01' between c.dw_to and c.dw_from
) h;
您可以使用相关子查询对此进行调整:
select d.dte,
( (select count(*) as cnt
from c_customer c
where date d.dte <= c.dw_to
) +
(select count(*) as cnt
from h_customer c
where date d.dte between c.dw_to and c.dw_from
)
) as cnt
from (select date '2016-01-01' as dte from dual union all
select date '2016-02-01' as dte from dual union all
select date '2016-03-01' as dte from dual union all
. . .
) d;
这不是解决此问题的唯一方法。但是对于少数日期来说,它在性能方面应该没问题。
答案 1 :(得分:0)
性能确实是这个问题的真正问题。如果您有日期表,则可以对其进行完全加入,并使用以下查询:
WITH dates AS
(SELECT '2016-01-01' AS dateid
UNION ALL SELECT '2016-02-01'
UNION ALL SELECT '2016-03-01'
UNION ALL SELECT '2016-04-01'
UNION ALL SELECT '2016-05-01'
UNION ALL SELECT '2016-06-01'
UNION ALL SELECT '2016-07-01'
UNION ALL SELECT '2016-08-01'
UNION ALL SELECT '2016-09-01'
UNION ALL SELECT '2016-10-01'
UNION ALL SELECT '2016-11-01'
UNION ALL SELECT '2016-12-01'
)
,c_cust AS
(SELECT 1 AS CustNr, 'a' AS name, '2014-01-01' AS DWH_FROM, NULL AS DWH_TO
UNION ALL SELECT 2,'b', '2015-01-01', NULL
UNION ALL SELECT 3,'c', '2016-01-01', NULL
UNION ALL SELECT 5,'e', '2016-04-01', NULL
UNION ALL SELECT 6,'f', '2016-06-01', NULL
)
, h_cust AS
(SELECT 10 AS CustNr, 'j' AS name, '2010-01-01' AS DWH_FROM, '2010-12-31' AS DWH_TO
UNION ALL SELECT 12,'k', '2015-01-01', '2016-12-31'
UNION ALL SELECT 15,'m', '2016-01-01', '2016-06-31'
UNION ALL SELECT 20,'p', '2014-01-01', '2016-03-31'
UNION ALL SELECT 26,'r', '2015-01-01', '2015-12-31'
)
,all_cust AS
(
SELECT * FROM c_cust c
UNION ALL SELECT * FROM h_cust h
)
SELECT d.dateid, COUNT(*) AS ActiveUsers
FROM all_cust c
,dates d
WHERE d.dateid > c.DWH_FROM AND d.dateid < ISNULL(c.DWH_TO, '9999-12-31')
GROUP BY d.dateid
你得到了结果:
dateid ActiveUsers
2016-01-01 4
2016-02-01 6
2016-03-01 6
2016-04-01 5
2016-05-01 6
2016-06-01 6
2016-07-01 6
2016-08-01 6
2016-09-01 6
2016-10-01 6
2016-11-01 6
2016-12-01 6
答案 2 :(得分:0)
这是解决此问题的有效方法。
在某个地方,您需要创建报告所需的所有日期(2016年每个月的第一个月)。我在分层(子)查询中执行此操作,我在解决方案中将其命名为mth
。
在下面的代码中,我在with
子句中创建测试数据;该数据不是解决方案的一部分(在使用实际表之前应将其删除)。我没有完全使用您的表名 - 我只创建了与此练习相关的列。
将列名放在子查询声明中,就像我在with
子句中所做的那样,这是Oracle 11.2中的一个新功能。如果您使用的是旧版本,则需要将列名称移动到每个子查询定义中。如果需要,这是一个微不足道的变化。
策略是使用适当的连接条件将“月”或“日历”表(包含12个第一个月的日期的表)加入到每个“当前”和“历史”客户表中每。使用UNION ALL
收集结果(这是可能的,因为在每个联接中,只要与任一客户表中的行匹配,我们需要保留的是“日历”日期,第一个月)。然后,这是一个简单的按日期和计数分组的问题。
with
curr_cust ( custnr, dwh_from ) as (
select 101, date '2013-10-15' from dual union all
select 102, date '2016-03-11' from dual union all
select 105, date '2015-04-02' from dual union all
select 113, date '2016-12-15' from dual
),
hist_cust ( custnr, dwh_from, dwh_to ) as (
select 100, date '2014-12-01', date '2015-12-20' from dual union all
select 102, date '2015-11-15', date '2016-02-08' from dual union all
select 108, date '2016-03-01', date '2016-08-03' from dual union all
select 108, date '2016-10-15', date '2016-12-15' from dual
),
mth ( dt ) as (
select add_months(date '2016-01-01', level - 1) from dual
connect by level <= 12
)
select to_char(dt, 'yyyy-mm-dd') as dt, count(*) as cust_count
from ( select dt
from mth m join curr_cust c on m.dt >= c.dwh_from
union all
select dt
from mth m join hist_cust h on m.dt between h.dwh_from and h.dwh_to
)
group by dt
order by dt -- if needed
;
输出(包含在查询中的测试数据):
DT CUST_COUNT
---------- ----------
2016-01-01 3
2016-02-01 3
2016-03-01 3
2016-04-01 4
2016-05-01 4
2016-06-01 4
2016-07-01 4
2016-08-01 4
2016-09-01 3
2016-10-01 3
2016-11-01 4
2016-12-01 4
12 rows selected.