Question

我想在我的数据库中添加一列，该列针对每个ID对另一列的值执行按行累积连接，并按不同的层次结构对结果字符串进行排序。数据集非常大，我设计的较小测试数据的结果无法在较大规模上使用，因此需要重新设计它的帮助。

到目前为止，我已经使用递归CTE的组合编写查询，以执行累积级联（下面的步骤1输出），然后使用一个笨拙的函数（下面的步骤2输出）来根据单独的层次结构对字符串进行排序这也会删除“ 1”值。这些仅处理我的一小部分数据（n = 60），但是当我尝试运行较大的子集（n = 500,000）时，CTE表将永远运行（停止运行，直到2小时才完成）。实际数据集的数量级将达到亿万行，因此该解决方案不适用于该规模。

ID  Start_Date  End_Date    Seg step1       step2
1   01/04/1946  31/12/1990  1   1            1
1   01/01/1991  08/01/2007  4   4            4
1   09/01/2007  04/02/2007  1   1            1
1   05/02/2007  18/10/2017  4   14           4
1   01/04/2013  18/10/2017  8   148          48
1   11/11/2014  18/10/2017  7   1487         487
2   01/05/1931  31/12/1997  1   1            1
2   01/01/1998  20/01/2014  4   4            4
2   31/01/2011  20/01/2014  6   146          46
2   21/02/2013  20/01/2014  5   1465         456
2   01/04/2013  20/01/2014  8   14658        4586
2   29/04/2013  20/01/2014  7   146587       45876

还有其他复杂的逻辑元素，例如仅在开始日期早于上一行结束日期时才开始累积，因此一种解决方案通过添加where或case when语句来实现灵活性。键。

以下是我使用的递归CTE和排序函数的示例（不适用于所示的简化表，但表示我已使用的结构）。

递归CTE（输出步骤1列）

with t (ID, Segment,start_date, start_comb,updated_end_date ,rn) as (
            select ID, Segment, start_date, case when Segment_end_date <> resolved_date OR Segment_end_date is null then 1 else 0 end as start_comb
                                                    ,updated_end_date
                                                    ,row_number() over (partition by ID order by start_date) as rn
                                                    from #test_IDs
) 

,r (ID, orig_seg, Segment, rn, start_comb, start_date, updated_end_date) as (

  select ID, cast(Segment as varchar(max)), cast(Segment as varchar(max)),rn, start_comb, start_date, updated_end_date
  from t
  where start_comb=0
  union all
  select r.ID, cast(t.segment as varchar(max)) as orig_seg
  , Segment = cast( (concat(r.Segment,t.Segment)) as varchar(max))
  , t.rn, t.start_comb, t.start_date, t.updated_end_date
  from r
  join t on t.ID = r.ID and t.rn = r.rn + 1 and t.start_comb <> 0 
)

订购功能（输出步骤2列）


if object_id ('reformat') is not null
drop function reformat

create function dbo.reformat
( 
    @unordered_Segs varchar(max)
)
returns varchar(255)
as
begin

    declare @healthy int, @first int, @second int, @third int, @fourth int, @fifth int, @outtext int

    if Charindex('4',@unordered_segs)  > 0 

            set @first = 4
            else set @first = ''
    if Charindex('5',@unordered_segs)  > 0

            set @second = 5
            else set @second = ''
    if Charindex('8',@unordered_segs)  > 0

            set @third = 8
            else set @third = ''
    if Charindex('7',@unordered_segs)  > 0

            set @fourth = 7
            else set @fourth = ''
    if Charindex('6',@unordered_segs)  > 0

            set @fifth = 6
            else set @fifth = ''

    if Charindex('1',@unordered_segs)  > 0 and len(@unordered_segs) = 1
            set @outtext = 1
            else
    set @outtext = Replace((concat(@first,@second,@third,@fourth,@fifth)),'0','')

return      @outtext
end

谢谢！

如何按字符串的顺序跨行创建累积字符串连接？

0 个答案: