Question

非常感谢您的专业知识。这是我的问题，我们有一个包含每月快照的表，其中包含ID和快照日期：

    ID,Snapshot_Date
    1,Dec-17
    2,Dec-17
    3,Dec-17
    1,Jan-18
    3,Jan-18
    4,Jan-18
    3,Feb-18
    5,Feb-18
    6,Feb-18

要求是能够在任何给定的月份（通常是最近的一个月）报告[某些月份和年份] x ID，我们丢失了Y ID并获得了Z ID。

SQL查询的一个逻辑是返回所有月份，所有ID及其状态，0 =结转，+ 1 =新，-1 =删除。注意ID = 2是如何从2月18日开始下降的，因为我们注意到它在1月18日在结果中下降了：

    Snapshot_Date,ID,Status
    Dec-17,1,+1
    Dec-17,2,+1
    Dec-17,3,+1
    Jan-18,1,0
    Jan-18,2,-1
    Jan-18,3,0
    Jan-18,4,+1
    Feb-18,1,-1
    Feb-18,3,0
    Feb-18,4,0
    Feb-18,5,+1

Answer 1

为回答此类问题而必须构建的数据模型基于ID和时间单位的所有可能组合，即您需要在您的案例中每个月为每个ID创建一条记录，并说明此ID是否在本月出现，而且与前一个月相同。概念证明如下：

WITH
 ids as (
    select distinct id from your_table
)
,months as (
    select distinct snapshot_date from your_table
)
,id_month_pairs as (
    select *
    from ids
    cross join months
)
,id_months as (
    select *
    ,(your_table.id is not null) as is_present_this_month
    ,lag(your_table.id is not null) over (partition by id order by to_date(snapshot_date,'Mon-YY')) as is_present_prev_month
    from id_month_pairs
    left join your_table
    using (id, snapshot_date)
)
select 
 snapshot_date as month
,sum(case when is_present_this_month and is_present_prev_month then 1 else 0 end) as carried_over
,sum(case when is_present_this_month and not is_present_prev_month then 1 else 0 end) as gained
,sum(case when not is_present_this_month and is_present_prev_month then 1 else 0 end) as lost
from id_months
group by 1
order by to_date(snapshot_date,'Mon-YY')

实际上，您可以做的是在新表中实现id_months步骤（在Redshift中执行ELT），将文本月份转换为日期月份并添加其他有用维度，并针对此表单独运行聚合步骤

P.S。选择不同的和交叉连接以获得id_months不是最佳的，这只是为了显示一般的想法。您可以使用月份日期表并将其加入ID列表，以便记录仅从ID出现的第一个月开始，而不是所有月/ ID对。但是我把它留给你了:)）

随着时间的推移比较ID（新的，现有的，丢弃的）

1 个答案: