无法解决具有多个变量的分区上排名

时间:2019-08-22 16:08:22

标签: sql teradata partitioning

我正在尝试分析大量交易数据,并设置了一系列不同的等级来帮助我。我无法理解的是受益人等级。我希望它按时间顺序(而不是按字母顺序)划分受益人发生变化的地方。

从1月至3月支付相同的受益人,然后在6月再次支付,我希望将6月归为单独的“会话”。

如果有帮助,我正在使用Teradata SQL。

我以为解决方案将是DENSE_RANK,但如果我PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate,它可以算几个月。如果我PARTITION BY (CustomerID) ORDER BY Beneficiary不是按时间顺序排列的,那么我需要最高的排名才能成为最新的Beneficiary

SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
  ,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
  ,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
  ,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
  ,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID  Beneficiary Amount  DateStamp   Month   PaymentRank MonthRank   BeneficiaryRank TransactionRank
a   aa  10      Jan 1   1   1   1
a   aa  20      Feb 2   2   2   1
a   aa  20      Mar 3   3   3   2
a   aa  20      Apr 4   4   4   3
a   bb  20      May 5   5   1   1
a   bb  30      Jun 6   6   2   1
a   aa  30      Jul 7   7   5   2
a   aa  30      Aug 8   8   6   1
a   cc  5       Sep 9   9   1   1
a   cc  5       Oct 10  10  2   2
a   cc  5       Nov 11  11  3   3
b   cc  5       Dec 1   1   1   1

到目前为止,这是我想要的,我希望旁边有一列,如下所示

CustomerID  Beneficiary Amount  DateStamp   Month   NewRank
a   aa  10      Jan 1
a   aa  20      Feb 1
a   aa  20      Mar 1
a   aa  20      Apr 1
a   bb  20      May 2
a   bb  30      Jun 2
a   aa  30      Jul 3
a   aa  30      Aug 3
a   cc  5       Sep 4
a   cc  5       Oct 4
a   cc  5       Nov 4
b   cc  5       Dec 1

3 个答案:

答案 0 :(得分:0)

这是一种空白和岛屿问题。我建议使用lag()和累计金额:

select t.*,
       sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)            
from (select t.*,
             lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
      from t
     ) t

答案 1 :(得分:0)

SELECT dt.*,
   -- now do a Cumulative Sum over those 0/1
   SUM(flag)
   OVER(PARTITION BY CustomerID
        ORDER BY SystemDate ASC
                ,flag DESC -- needed if the order by columns are not unique
        ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
 ( 
    SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
      ,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
      ,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
      ,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
      ,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
      -- assign a 0 if current & previous Beneficiary are the same, otherwise 1 
      ,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
    FROM table 
 ) AS dt
ORDER BY CustomerID, PaymentRank

您的Gordon查询问题可能是由您的Teradata版本引起的,LAG仅在16.10+中受支持。但是有一个简单的解决方法:

LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)

--is equivalent to 
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
                      ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))

答案 2 :(得分:0)

@Gordon和@dnoeth提供了使我步入正轨的构想和代码,以示感谢。

以下内容大部分都是从技巧中删除的,但需要先添加无界的ROWS才能正确进行聚合。没有这个,它只是显示分区的总数。我也将systemdate更改为paymentrank,因为我不得不在一天中重复输入一些东西。

SELECT dt.*,
   -- now do a Cumulative Sum over those 0/1
   SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
 ( 
    SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
      -- assign a 0 if current & previous Beneficiary are the same, otherwise 1 
      ,CASE WHEN  Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC  ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)  THEN 0 ELSE 1 END  AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank

每当受益人更改时,内部查询都会设置一个标志。然后,外部查询会对它们进行累加。

我不确定前面的内容在做什么,@ dnoeth有一个很好的解释here以下是该解释的来源。

  

•UNBOUNDED PRECEDING,当前行之前的所有行->已固定

     

•UNBOUNDED FOLLOWING,当前行之后的所有行->已固定

     

•x PRECEDING,当前行之前的x行->相对

     

•y跟随,当前行之后的y行->相对