我想在我的数据库中添加一列,该列针对每个ID对另一列的值执行按行累积连接,并按不同的层次结构对结果字符串进行排序。数据集非常大,我设计的较小测试数据的结果无法在较大规模上使用,因此需要重新设计它的帮助。
到目前为止,我已经使用递归CTE的组合编写查询,以执行累积级联(下面的步骤1输出),然后使用一个笨拙的函数(下面的步骤2输出)来根据单独的层次结构对字符串进行排序这也会删除“ 1”值。这些仅处理我的一小部分数据(n = 60),但是当我尝试运行较大的子集(n = 500,000)时,CTE表将永远运行(停止运行,直到2小时才完成)。实际数据集的数量级将达到亿万行,因此该解决方案不适用于该规模。
ID Start_Date End_Date Seg step1 step2
1 01/04/1946 31/12/1990 1 1 1
1 01/01/1991 08/01/2007 4 4 4
1 09/01/2007 04/02/2007 1 1 1
1 05/02/2007 18/10/2017 4 14 4
1 01/04/2013 18/10/2017 8 148 48
1 11/11/2014 18/10/2017 7 1487 487
2 01/05/1931 31/12/1997 1 1 1
2 01/01/1998 20/01/2014 4 4 4
2 31/01/2011 20/01/2014 6 146 46
2 21/02/2013 20/01/2014 5 1465 456
2 01/04/2013 20/01/2014 8 14658 4586
2 29/04/2013 20/01/2014 7 146587 45876
还有其他复杂的逻辑元素,例如仅在开始日期早于上一行结束日期时才开始累积,因此一种解决方案通过添加where
或case when
语句来实现灵活性。键。
以下是我使用的递归CTE和排序函数的示例(不适用于所示的简化表,但表示我已使用的结构)。
递归CTE(输出步骤1列)
with t (ID, Segment,start_date, start_comb,updated_end_date ,rn) as (
select ID, Segment, start_date, case when Segment_end_date <> resolved_date OR Segment_end_date is null then 1 else 0 end as start_comb
,updated_end_date
,row_number() over (partition by ID order by start_date) as rn
from #test_IDs
)
,r (ID, orig_seg, Segment, rn, start_comb, start_date, updated_end_date) as (
select ID, cast(Segment as varchar(max)), cast(Segment as varchar(max)),rn, start_comb, start_date, updated_end_date
from t
where start_comb=0
union all
select r.ID, cast(t.segment as varchar(max)) as orig_seg
, Segment = cast( (concat(r.Segment,t.Segment)) as varchar(max))
, t.rn, t.start_comb, t.start_date, t.updated_end_date
from r
join t on t.ID = r.ID and t.rn = r.rn + 1 and t.start_comb <> 0
)
订购功能(输出步骤2列)
if object_id ('reformat') is not null
drop function reformat
create function dbo.reformat
(
@unordered_Segs varchar(max)
)
returns varchar(255)
as
begin
declare @healthy int, @first int, @second int, @third int, @fourth int, @fifth int, @outtext int
if Charindex('4',@unordered_segs) > 0
set @first = 4
else set @first = ''
if Charindex('5',@unordered_segs) > 0
set @second = 5
else set @second = ''
if Charindex('8',@unordered_segs) > 0
set @third = 8
else set @third = ''
if Charindex('7',@unordered_segs) > 0
set @fourth = 7
else set @fourth = ''
if Charindex('6',@unordered_segs) > 0
set @fifth = 6
else set @fifth = ''
if Charindex('1',@unordered_segs) > 0 and len(@unordered_segs) = 1
set @outtext = 1
else
set @outtext = Replace((concat(@first,@second,@third,@fourth,@fifth)),'0','')
return @outtext
end
谢谢!