分组在中间有“洞”

时间:2017-06-12 11:15:18

标签: sql sql-server

鉴于下表:

create table #Track (id int identity, vehicle int, station varchar(50), pieces int, distance int)
insert into #Track (vehicle, station, pieces, distance)
values 
(1, 'A', 10, 0), (1, 'B', 10, 50), (1, 'C', 11, 23), (1, 'D', 11, 40), (1, 'E', 10, 5)

这是我需要的结果(注意C和D之间的片段字段的变化):

vehicle station_from    station_to  pieces  distance_all
1            A             B          10      50
1            C             D          11      63
1            E             E          10       5

如果我执行此查询:

select  A.vehicle,
        T1.station station_from,
        T2.station station_to,
        A.pieces,
        A.distance_all
from (
select  vehicle,
        min(id) min_id,
        max(id) max_id,
        pieces,
        sum(distance) distance_all
from    #Track
group
by      vehicle,
        pieces
) A join #Track T1 on A.min_id = T1.id
    join #Track T2 on A.max_id = T2.id

我得到了错误的结果(distance_all是正确的,但是来自和来的车站不是。似乎车辆1从A到E然后从C到D:

vehicle station_from    station_to  pieces  distance_all
1             A              E        10         55
1             C              D        11         63

如何在不使用游标的情况下获得所需的结果(表格相当大,数百万条记录)

3 个答案:

答案 0 :(得分:2)

这是"差距和岛屿的变体"问题。在您的情况下,您可以使用行号的不同来解决它:

select vehicle,
       max(case when seqnum_grp = 1 then station end) as station_from,
       max(case when seqnum_grp_desc = 1 then station end) as station_to,
       pieces,
       sum(pieces) as pieces_all
from (select t.*,
             row_number() over (partition by vehicle, pieces, (seqnum - seqnum_p) order by id) as seqnum_grp,
             row_number() over (partition by vehicle, pieces, (seqnum - seqnum_p) order by id desc) as seqnum_grp_desc
      from (select t.*,
                   row_number() over (partition by vehicle order by id) as seqnum,
                   row_number() over (partition by vehicle, pieces order by id) as seqnum_p
           from #Track t
          ) t
     ) t
group by vehicle, pieces, (seqnum - seqnum_p);

要了解其工作原理,您需要了解为什么行号的差异可以识别这些组。为此,您需要运行最里面的子查询并盯着结果。

这比大多数此类问题有点棘手,因为你想要沿途的第一站和最后一站。因此,有一个额外的子查询。

答案 1 :(得分:1)

我认为这就是你要做的。将具有相同碎片值的连续行作为一组处理车辆,并获得该站的最小值和最大值,该组的距离总和。使用lag获取前一行的片段值,并在与当前行不同时重置该组(以运行总和)。此后,它只是一个分组操作。

select distinct vehicle
,first_value(station) over(partition by vehicle,grp order by id) as station_from
,first_value(station) over(partition by vehicle,grp order by id desc) as station_to
,pieces
,sum(distance) over(partition by vehicle,grp) as distance_all
from (select t.* ,sum(case when prev_pieces=pieces then 0 else 1 end) over(partition by vehicle order by id) as grp
      from (select t.*,lag(pieces) over(partition by vehicle order by id) as prev_pieces
            from Track t
           ) t
     ) t  

Sample Demo

答案 2 :(得分:1)

您可以查询如下:

int

输出:

Select vehicle, min(Station) as Station_From, Max(station) Station_To, pieces, sum(distance) as Distance_all 
from (
    Select *, [Bucket] =Row_number() over(order by id) - Row_number() over(partition by pieces order by id)
    from #Track
) a
group by vehicle, pieces, [Bucket]