Question

我有下表（#CategoryWeight），其中按日期范围存储了每个类别的权重和因子值。如果可能，我需要汇总/简化此数据，以便将连续的数据范围合并为一个具有相同权重和因子值的宽范围。

DROP TABLE IF EXISTS #CategoryWeight;
CREATE TABLE #CategoryWeight ( [CategoryId] bigint, [weight] float(8), [factor] float(8), [startYear] nvarchar(60), [endYear] nvarchar(60) )
INSERT INTO #CategoryWeight
VALUES
( 42, 1, 0, N'2009', N'2014' ), 
( 42, 1, 0, N'2009', N'2019' ), 
( 42, 1, 0, N'2015', N'2017' ), 
( 42, 1, 0, N'2018', N'2019' ), 
( 42, 1, 1, N'2020', N'9999' ),

( 40, 1, 0, N'2009', N'2014' ), 
( 40, 1, 0, N'2009', N'2017' ), 
( 40, 1, 0, N'2015', N'2017' ), 
( 40, 1, 0, N'2020', N'9999' ), 
( 40, 1, 1, N'2018', N'2019' ),

( 45, 1, 0, N'2009', N'2014' ), 
( 45, 0, 0, N'2015', N'2017' ), 
( 45, 1, 0, N'2020', N'9999' ), 
( 45, 0, 1, N'2018', N'2019' );'

CategoryID  weight  factor  startYear   endYear
42          1       0       2009        2014
42          1       0       2009        2019
42          1       0       2015        2017
42          1       0       2018        2019
42          1       1       2020        9999
40          1       0       2009        2014
40          1       0       2009        2017
40          1       0       2015        2017
40          1       0       2020        9999
40          1       1       2018        2019
45          1       0       2009        2014
45          0       0       2015        2017
45          1       0       2020        9999
45          0       1       2018        2019

预期结果：

CategoryID  weight  factor  startYear   endYear
42          1       0       2009        2019
42          1       1       2020        9999
40          1       0       2009        2017
40          1       1       2018        2019
40          1       0       2020        9999
45          1       0       2009        2014
45          0       0       2015        2017
45          0       1       2018        2019
45          1       0       2020        9999

Answer 1

如果您使用的是MySQL 8.0，SQL Server或PostgreSQL，则可以使用窗口功能执行以下操作。

select
    distinct CategoryID, 
    weight, 
    factor,
    min(startYear) over (partition by CategoryID, weight, factor) as startYear,
    max(endYear) over (partition by CategoryID, weight, factor) as endYear
from categoryWeight
order by
    CategoryID

Answer 2

您有重叠的时间段。这就使关于数据的任何假设都变得麻烦了–因为同一年可以在不同的行上具有不同的值（问题中的任何内容都不能排除这一点）。

因此，我建议的方法是扩展数据，然后将其重新组合成值相同的块。以下内容使用递归CTE扩展数据，然后使用空缺的技巧重新组合数据：

with cte as (
      select categoryid, weight, factor,
             convert(int, startyear) as year, convert(int, endyear) as endyear
      from categoryweight
      union all
      select categoryid, weight, factor,
             year + 1, endyear
      from cte
      where year < endyear
     )
select categoryid, weight, factor, min(year), max(year)
from (select categoryid, weight, factor, year,
             row_number() over (partition by categoryid, weight, factor order by year) as seqnum
      from (select distinct categoryid, weight, factor, year from cte) cte
     ) cte
group by categoryid, weight, factor, (year - seqnum)
order by categoryid, min(year)
option (maxrecursion 0);

Here是db <>小提琴。

我注意到有关您的数据的一些事情。

您正在使用float作为某些值。这是非常非常危险的，因为两个值可能看起来相同，但实际上相差很小。请改用decimal / numeric类型，以便您所见即所得。
year值是字符串，当它们应为整数时。使用正确的数据类型！

汇总日期范围内的数据

2 个答案: