在Sql server中进行智能分组

时间:2017-12-13 09:12:29

标签: sql-server

我需要制作一种群集,试图使用row_number,rank和dense_rank,但没有任何效果。

USER  START_DATE        ID_TO_CLUSTER  END_DATE          NEW FIELD_SESSION
262   01/10/2017 00:01  4              01/10/2017 00:03  S1
262   01/10/2017 00:02  4              01/10/2017 00:03  S1
262   01/10/2017 00:03  0              01/10/2017 00:03  NO SESSION 
262   01/10/2017 00:03  1              01/10/2017 00:03  NO SESSION 
262   01/10/2017 00:03  7              01/10/2017 00:03  NO SESSION 
262   01/10/2017 00:03  2              01/10/2017 00:07  NO SESSION 
262   01/10/2017 00:07  3              01/10/2017 00:07  NO SESSION 
262   01/10/2017 00:07  4              01/10/2017 00:11  S2
262   01/10/2017 00:07  4              01/10/2017 00:11  S2
262   01/10/2017 00:11  7              01/10/2017 00:11  NO SESSION 
262   01/10/2017 00:11  9              01/10/2017 00:11  NO SESSION 
262   01/10/2017 16:28  0              01/10/2017 16:30  NO SESSION 
262   01/10/2017 16:28  1              01/10/2017 16:28  NO SESSION 
262   01/10/2017 16:30  2              01/10/2017 16:30  NO SESSION 
262   01/10/2017 16:30  3              01/10/2017 16:30  NO SESSION 
262   01/10/2017 16:30  4              01/10/2017 16:36  S3
262   01/10/2017 16:30  4              01/10/2017 16:36  S3
262   01/10/2017 16:36  4              01/10/2017 16:36  S3

基本上我需要在会话(new_field_session)中将ID_TO_CLUSTER分组,以便在后续时间复制id_to_cluster,以获得每个群集的最小start_date和最大结束日期。

你能帮帮我吗?

更新

回应Leran 2002的答案:建议的解决方案只有当行到组为2时才有效,而当它们为3或更多时则不然。有什么想法吗?

1 个答案:

答案 0 :(得分:0)

尝试以下查询

SELECT
  GroupNum,
  ID_TO_CLUSTER,
  NEW_FIELD_SESSION,
  MIN(START_DATE) MIN_START_DATE,
  MAX(END_DATE) MAX_END_DATE
FROM
  (
    SELECT
      *,
      CASE WHEN NEW_FIELD_SESSION=PrevSession THEN LAG(RowNum)OVER(ORDER BY RowNum) ELSE RowNum END GroupNum
    FROM
      (
        SELECT
          *,
          LAG(NEW_FIELD_SESSION)OVER(ORDER BY START_DATE,ID_TO_CLUSTER) PrevSession,
          LEAD(NEW_FIELD_SESSION)OVER(ORDER BY START_DATE,ID_TO_CLUSTER) NextSession,
          ROW_NUMBER()OVER(ORDER BY START_DATE,ID_TO_CLUSTER) RowNum
        FROM [Your table]
      ) q
    WHERE (PrevSession IS NULL OR NextSession IS NULL OR NEW_FIELD_SESSION<>PrevSession OR NEW_FIELD_SESSION<>NextSession)
  ) q
GROUP BY GroupNum,ID_TO_CLUSTER,NEW_FIELD_SESSION

我在SQLServer 2014上测试过它。

SQL小提琴 - http://sqlfiddle.com/#!6/6e983/1