我有下表(#CategoryWeight),其中按日期范围存储了每个类别的权重和因子值。如果可能,我需要汇总/简化此数据,以便将连续的数据范围合并为一个具有相同权重和因子值的宽范围。
DROP TABLE IF EXISTS #CategoryWeight;
CREATE TABLE #CategoryWeight ( [CategoryId] bigint, [weight] float(8), [factor] float(8), [startYear] nvarchar(60), [endYear] nvarchar(60) )
INSERT INTO #CategoryWeight
VALUES
( 42, 1, 0, N'2009', N'2014' ),
( 42, 1, 0, N'2009', N'2019' ),
( 42, 1, 0, N'2015', N'2017' ),
( 42, 1, 0, N'2018', N'2019' ),
( 42, 1, 1, N'2020', N'9999' ),
( 40, 1, 0, N'2009', N'2014' ),
( 40, 1, 0, N'2009', N'2017' ),
( 40, 1, 0, N'2015', N'2017' ),
( 40, 1, 0, N'2020', N'9999' ),
( 40, 1, 1, N'2018', N'2019' ),
( 45, 1, 0, N'2009', N'2014' ),
( 45, 0, 0, N'2015', N'2017' ),
( 45, 1, 0, N'2020', N'9999' ),
( 45, 0, 1, N'2018', N'2019' );'
CategoryID weight factor startYear endYear
42 1 0 2009 2014
42 1 0 2009 2019
42 1 0 2015 2017
42 1 0 2018 2019
42 1 1 2020 9999
40 1 0 2009 2014
40 1 0 2009 2017
40 1 0 2015 2017
40 1 0 2020 9999
40 1 1 2018 2019
45 1 0 2009 2014
45 0 0 2015 2017
45 1 0 2020 9999
45 0 1 2018 2019
预期结果:
CategoryID weight factor startYear endYear
42 1 0 2009 2019
42 1 1 2020 9999
40 1 0 2009 2017
40 1 1 2018 2019
40 1 0 2020 9999
45 1 0 2009 2014
45 0 0 2015 2017
45 0 1 2018 2019
45 1 0 2020 9999
答案 0 :(得分:0)
如果您使用的是MySQL 8.0,SQL Server或PostgreSQL,则可以使用窗口功能执行以下操作。
select
distinct CategoryID,
weight,
factor,
min(startYear) over (partition by CategoryID, weight, factor) as startYear,
max(endYear) over (partition by CategoryID, weight, factor) as endYear
from categoryWeight
order by
CategoryID
答案 1 :(得分:0)
您有重叠的时间段。这就使关于数据的任何假设都变得麻烦了–因为同一年可以在不同的行上具有不同的值(问题中的任何内容都不能排除这一点)。
因此,我建议的方法是扩展数据,然后将其重新组合成值相同的块。以下内容使用递归CTE扩展数据,然后使用空缺的技巧重新组合数据:
with cte as (
select categoryid, weight, factor,
convert(int, startyear) as year, convert(int, endyear) as endyear
from categoryweight
union all
select categoryid, weight, factor,
year + 1, endyear
from cte
where year < endyear
)
select categoryid, weight, factor, min(year), max(year)
from (select categoryid, weight, factor, year,
row_number() over (partition by categoryid, weight, factor order by year) as seqnum
from (select distinct categoryid, weight, factor, year from cte) cte
) cte
group by categoryid, weight, factor, (year - seqnum)
order by categoryid, min(year)
option (maxrecursion 0);
Here是db <>小提琴。
我注意到有关您的数据的一些事情。
float
作为某些值。这是非常非常危险的,因为两个值可能看起来相同,但实际上相差很小。请改用decimal
/ numeric
类型,以便您所见即所得。year
值是字符串,当它们应为整数时。使用正确的数据类型!