样本数据

Question

我有一个表[ContactCallDetail]，它存储来自我们的电话系统的每个呼叫段的呼叫数据。数据以4部分主键存储：（[SessionID]，[SessionSeqNum]，[NodeID]，[ProfileID]）。 [NodeID]，[ProfileID]和[SessionID]一起组成一个呼叫，[SessionSeqNum]定义呼叫的每个分支，因为呼叫者从一个部门/代表转移到下一个部门/代表。

我需要查看通话的每一段，如果发生转接，找到通话的下一站，以便我可以报告转接呼叫的去向。我面临的问题是1）会话序列并不总是以相同的数字开头2）序列号中可能有间隙3）表格有15,000,000行，并且每晚都通过数据导入添加，所以我需要一个基于非游标的解决方案。

样本数据

| sessionid    | sessionseqnum | nodeid | profileid |
| 170000459184 | 0             | 1      | 1         |
| 170000459184 | 1             | 1      | 1         |
| 170000459184 | 3             | 1      | 1         |
| 170000229594 | 1             | 1      | 1         |
| 170000229594 | 2             | 1      | 1         |
| 170000229598 | 0             | 1      | 1         |
| 170000229598 | 2             | 1      | 1         |
| 170000229600 | 0             | 1      | 1         |
| 170000229600 | 1             | 1      | 1         |
| 170000229600 | 3             | 1      | 1         |
| 170000229600 | 5             | 1      | 1         |

我想我需要做的是使用标识列或rownum（）等创建查找表，以获得没有间隙的调用段的新序列号。我该怎么做？或者，如果有不同的最佳实践解决方案，您可以指出我会很棒。

Answer 1

您可以使用lead（）分析函数来识别下一个会话序列号。

SELECT  sessionid ,
        nodeid ,
        profileid ,
        sessionseqnum ,
        lead(sessionseqnum) OVER ( PARTITION BY sessionid, nodeid, profileid ORDER BY sessionseqnum ) AS next_seq_num
FROM    ContactCallDetail
ORDER BY sessionid ,
        nodeid ,
        profileid ,
        sessionseqnum;

sessionid         nodeid  profileid  sessionseqnum next_seq_num
--
170000229594      1       1          1             2
170000229594      1       1          2      
170000229598      1       1          0             2
170000229598      1       1          2      
170000229600      1       1          0             1
170000229600      1       1          1             3
170000229600      1       1          3             5
170000229600      1       1          5      
170000459184      1       1          0             1
170000459184      1       1          1             3
170000459184      1       1          3

ORDER BY子句不是绝对必要的;它只是让人类更容易阅读输出。

现在，您可以加入原始表以生成显示相关行对的行。在标准SQL中有几种不同的表达方式。在这里，我使用的是公用表表达式。

WITH    next_seq_nums
          AS ( SELECT   * ,
                        lead(sessionseqnum) OVER ( PARTITION BY sessionid, nodeid, profileid ORDER BY sessionseqnum ) AS next_seq_num
               FROM     ContactCallDetail
             )
    SELECT  t1.sessionid ,
            t1.nodeid ,
            t1.profileid ,
            t1.sessionseqnum ,
            t2.sessionseqnum next_sessionseqnum ,
            t2.nodeid next_nodeid ,
            t2.profileid next_profileid
    FROM    next_seq_nums t1
            LEFT JOIN ContactCallDetail t2 ON t1.sessionid = t2.sessionid
                                              AND t1.nodeid = t2.nodeid
                                              AND t1.profileid = t2.profileid
                                              AND t1.next_seq_num = t2.sessionseqnum
    ORDER BY t1.sessionid ,
            t1.nodeid ,
            t1.profileid ,
            t1.sessionseqnum;

LEFT JOIN将在每个会话中的最后一个会话序列号的行中留下NULL。这是有道理的 - 在最后一排，没有“呼叫的下一站”。但如果需要，可以很容易地排除这些行。

如果你的dbms不支持lead（）分析函数，你可以用上面的表替换上面的公用表表达式。

WITH    next_seq_nums
          AS ( SELECT   t1.* ,
                        ( SELECT    MIN(sessionseqnum)
                          FROM      contactcalldetail
                          WHERE     sessionid = t1.sessionid
                                    AND nodeid = t1.nodeid
                                    AND profileid = t1.profileid
                                    AND sessionseqnum > t1.sessionseqnum
                        ) next_seq_num
               FROM     contactcalldetail t1
             )
             ...

Answer 2

with cte 
as 

(SELECT  *, 
rank() OVER 
(partition BY  sessionid,profileid,nodeid
ORDER BY sessionseqnum ) AS Rank
FROM         dbo.Table_1) 


SELECT  
   cte.sessionid,cte.nodeid,cte.profileid,cte.sessionseqnum,cte_1.sessionseqnum
FROM cte LEFT JOIN
cte AS cte_1 
ON cte.sessionid = cte_1.sessionid
and cte.profileid= cte_1.profileid
and cte.nodeid= cte_1.nodeid
and cte.rank= cte_1.rank-1

遍历顺序数据中的差距

样本数据

2 个答案: