我正在尝试分析大量交易数据,并设置了一系列不同的等级来帮助我。我无法理解的是受益人等级。我希望它按时间顺序(而不是按字母顺序)划分受益人发生变化的地方。
从1月至3月支付相同的受益人,然后在6月再次支付,我希望将6月归为单独的“会话”。
如果有帮助,我正在使用Teradata SQL。
我以为解决方案将是DENSE_RANK
,但如果我PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate
,它可以算几个月。如果我PARTITION BY (CustomerID) ORDER BY Beneficiary
不是按时间顺序排列的,那么我需要最高的排名才能成为最新的Beneficiary
。
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID Beneficiary Amount DateStamp Month PaymentRank MonthRank BeneficiaryRank TransactionRank
a aa 10 Jan 1 1 1 1
a aa 20 Feb 2 2 2 1
a aa 20 Mar 3 3 3 2
a aa 20 Apr 4 4 4 3
a bb 20 May 5 5 1 1
a bb 30 Jun 6 6 2 1
a aa 30 Jul 7 7 5 2
a aa 30 Aug 8 8 6 1
a cc 5 Sep 9 9 1 1
a cc 5 Oct 10 10 2 2
a cc 5 Nov 11 11 3 3
b cc 5 Dec 1 1 1 1
到目前为止,这是我想要的,我希望旁边有一列,如下所示
CustomerID Beneficiary Amount DateStamp Month NewRank
a aa 10 Jan 1
a aa 20 Feb 1
a aa 20 Mar 1
a aa 20 Apr 1
a bb 20 May 2
a bb 30 Jun 2
a aa 30 Jul 3
a aa 30 Aug 3
a cc 5 Sep 4
a cc 5 Oct 4
a cc 5 Nov 4
b cc 5 Dec 1
答案 0 :(得分:0)
这是一种空白和岛屿问题。我建议使用lag()
和累计金额:
select t.*,
sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)
from (select t.*,
lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
from t
) t
答案 1 :(得分:0)
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag)
OVER(PARTITION BY CustomerID
ORDER BY SystemDate ASC
,flag DESC -- needed if the order by columns are not unique
ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
FROM table
) AS dt
ORDER BY CustomerID, PaymentRank
您的Gordon查询问题可能是由您的Teradata版本引起的,LAG
仅在16.10+中受支持。但是有一个简单的解决方法:
LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)
--is equivalent to
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))
答案 2 :(得分:0)
@Gordon和@dnoeth提供了使我步入正轨的构想和代码,以示感谢。
以下内容大部分都是从技巧中删除的,但需要先添加无界的ROWS才能正确进行聚合。没有这个,它只是显示分区的总数。我也将systemdate更改为paymentrank,因为我不得不在一天中重复输入一些东西。
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) THEN 0 ELSE 1 END AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank
每当受益人更改时,内部查询都会设置一个标志。然后,外部查询会对它们进行累加。
我不确定前面的内容在做什么,@ dnoeth有一个很好的解释here以下是该解释的来源。
•UNBOUNDED PRECEDING,当前行之前的所有行->已固定
•UNBOUNDED FOLLOWING,当前行之后的所有行->已固定
•x PRECEDING,当前行之前的x行->相对
•y跟随,当前行之后的y行->相对