在SQL中查找连续增加的数字的最长序列

时间:2013-03-12 18:06:51

标签: sql sql-server

对于此示例,我说我有一个包含两个字段的表AREA varchar(30)OrderNumber INT

该表格包含以下数据

AREA      | OrderNumber
Fontana   |       32
Fontana   |       42
Fontana   |       76
Fontana   |       12
Fontana   |        3
Fontana   |       99
RC        |       32
RC        |        1
RC        |        8
RC        |        9
RC        |        4

我想返回

我想要返回的结果是每个区域增加连续值的最长长度。对于Fontana it is 3 (32, 42, 76)For RC it is 2 (8,9)

AREA    | LongestLength
Fontana |          3
RC      |          2

我如何在MS Sql 2005上执行此操作?

3 个答案:

答案 0 :(得分:8)

一种方法是使用跨越每一行的递归CTE。如果行符合条件(增加相同区域的订单号),则将链长增加1。如果没有,你就开始一个新的链:

; with  numbered as
        (
        select  row_number() over (order by area, eventtime) rn
        ,       *
        from    Table1
        )
,       recurse as
        (
        select  rn
        ,       area
        ,       OrderNumber
        ,       1 as ChainLength
        from    numbered
        where   rn = 1
        union all
        select  cur.rn
        ,       cur.area
        ,       cur.OrderNumber
        ,       case
                when cur.area = prev.area 
                     and cur.OrderNumber > prev.OrderNumber 
                     then prev.ChainLength + 1
                else 1
                end
        from    recurse prev
        join    numbered cur
        on      prev.rn + 1 = cur.rn
        )
select  area
,       max(ChainLength)
from    recurse
group by
        area

Live example at SQL Fiddle.

另一种方法是使用查询来查找“中断”,即结束同一区域的递增顺序号序列的行。中断之间的行数是长度。

; with  numbered as
        (
        select  row_number() over (order by area, eventtime) rn
        ,       *
        from    Table1 t1
        )
        -- Select rows that break an increasing chain
,       breaks as
        (
        select  row_number() over (order by cur.rn) rn2
        ,       cur.rn
        ,       cur.Area
        from    numbered cur
        left join
                numbered prev
        on      cur.rn = prev.rn + 1
        where   cur.OrderNumber <= prev.OrderNumber
                or cur.Area <> prev.Area
                or prev.Area is null
        )
        -- Add a final break after the last row
,       breaks2 as
        (
        select  *
        from    breaks
        union all
        select  count(*) + 1
        ,       max(rn) + 1
        ,       null
        from    breaks
        )
select  series_start.area
,       max(series_end.rn - series_start.rn)
from    breaks2 series_start
join    breaks2 series_end
on      series_end.rn2 = series_start.rn2 + 1
group by
        series_start.area

Live example at SQL Fiddle.

答案 1 :(得分:0)

您可以通过ROW_NUMBER()进行一些数学计算,找出连续项目的位置。

以下是代码示例:

;WITH rownums AS
(
    SELECT [area], 
        ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [ordernumber]) AS rid1, 
        ROW_NUMBER() OVER(PARTITION BY [area] ORDER BY [eventtime]) AS rid2
    FROM SomeTable
),
    differences AS 
(
    SELECT [area],
        [calc] = rid1 - rid2
    FROM rownums
),  
    summation AS
(
    SELECT [area], [calc], COUNT(*) AS lengths 
    FROM differences 
    GROUP BY [area], [calc]
)   
SELECT [area], MAX(lengths) AS LongestLength
FROM differences
JOIN summation
    ON differences.[calc] = summation.[calc]
    AND differences.area = calc.area
GROUP BY [area]

因此,如果我按照我的订单编号排序一组行号,而按照我的事件时间排序另一组行号,那么这两个数字之间的差异将始终相同,只要它们的顺序相同即可。

然后,您可以获得按这些差异分组的计数,然后拉出最大的计数以获得所需的数据。

编辑:...... 忽略第一次编辑,我急于求成。

答案 2 :(得分:0)

你没有解释为什么RC的最长序列不包括1,而Fontana的确包括32.我认为1被排除,因为它是减少:它来自32.然而,Fontana的32是第一个小组中的项目,我有两个想法如何解释为什么它被认为是增加。这或者正是因为它是该组的第一项,或者因为它为正(即好像是在0之后,因此增加)。

出于这个答案的目的,我假设后者,即如果一个组的第一个项目是正数则增加。以下脚本实现了以下想法:

  1. 按照您几乎忘记提及的AREA列的顺序枚举每个eventtime组中的行。

  2. 将枚举集合加入自身,将每一行与其前一行链接起来。

  3. 获取行与其前一个值之间差异的符号(将后者默认为0)。此时问题变为

  4. 按照#3中确定的符号对每个AREA组进行分区,并枚举每个子组的行。

  5. 找出#1中的行号与#4中的行号之间的差异。这将是识别单个条纹的标准(与AREA一起)。

  6. 最后,按AREA对结果进行分组,#3中的符号和#5的结果对行进行分组,计算行数并获得每AREA的最大数量。

    < / LI>

    我实现了以上内容:

    WITH enumerated AS (
      SELECT
        *,
        row = ROW_NUMBER() OVER (PARTITION BY AREA ORDER BY eventtime)
      FROM atable
    ),
    signed AS (
      SELECT
        this.eventtime,
        this.AREA,
        this.row,
        sgn = SIGN(this.OrderNumber - COALESCE(last.OrderNumber, 0))
      FROM      enumerated AS this
      LEFT JOIN enumerated AS last
        ON this.AREA = last.AREA
       AND this.row  = last.row + 1
    ),
    partitioned AS (
      SELECT
        AREA,
        sgn,
        grp = row - ROW_NUMBER() OVER (PARTITION BY AREA, sgn ORDER BY eventtime)
      FROM signed
    )
    SELECT DISTINCT
      AREA,
      LongestIncSeq = MAX(COUNT(*)) OVER (PARTITION BY AREA)
    FROM partitioned
    WHERE sgn = 1
    GROUP BY
      AREA,
      grp
    ;
    

    可以找到一个SQL小提琴演示here