构建更高效的MS SQL函数以均匀地返回分割范围

时间:2017-02-09 16:26:12

标签: sql-server sql-function sql-optimization

假设您有以下SQL表:

-- create temp table
CREATE TABLE [tempNums] 
(
    id INT NOT NULL,
    somedate datetime NULL
)
GO

使用一些数据(参见下面的tempSplitStringToInts定义):

-- with date
INSERT INTO [tempNums]
    SELECT id, GETUTCDATE()
    FROM [tempSplitStringToInts] ('1,2,3,5,10,100,101,102,103,233,1001,5002,5003,5005,5007,5010',',') 
GO

-- without date
INSERT INTO [tempNums]
    SELECT id, NULL
    FROM [tempSplitStringToInts] ('6,7,8,150,151,152,153,433,2001,2002,2003,2005,3007,10010',',') 
GO
  

如何构建更好/更快的功能,它将占用多个范围,以及a   标志位作为输入并返回范围值表?

这样的东西就是这样,但是对于非常大的表来说很慢:

-- create range function
CREATE FUNCTION [tempFnGetIdRanges]
(
    @apps INT,
    @has_date BIT
)
RETURNS @ret TABLE
(
    RangeNum INT, 
    MinNum INT, 
    MaxNum INT
)
AS
BEGIN

    DECLARE @i INT = 0;
    DECLARE @count INT;
    DECLARE @min INT;
    DECLARE @max INT = 0;

    IF @has_date = 1
    BEGIN
        SELECT @count = COUNT(id) 
            FROM [tempNums] 
            WHERE somedate IS NOT NULL
    END
    ELSE
    BEGIN
        SELECT @count = COUNT(id) 
            FROM [tempNums] 
            WHERE somedate IS NULL

    END

    DECLARE @top INT = @count/@apps;

    WHILE @i<@apps
    BEGIN

        IF @i+1=@apps
        BEGIN
            -- on last get reminder
            SET @top = @top + @apps 
        END

        IF @has_date = 1
        BEGIN       
            SELECT @min = MIN(id), @max = MAX(id)
            FROM
            (
                SELECT TOP (@top) id 
                FROM [tempNums] 
                WHERE somedate IS NOT NULL
                    AND id > @max
                ORDER BY id
            ) XX
        END
        ELSE
        BEGIN
            SELECT @min = MIN(id), @max = MAX(id)
            FROM
            (
                SELECT TOP (@top) id 
                FROM [tempNums] 
                WHERE somedate IS NULL
                    AND id > @max
                ORDER BY id
            ) XX
        END


        INSERT INTO @ret VALUES(@i, @min, @max)

        SET @i = @i + 1;
        CONTINUE
    END

    RETURN
END
GO

所以当你运行以下内容时:

SELECT * FROM [tempFnGetIdRanges](4, 0)
SELECT * FROM [tempFnGetIdRanges](4, 1)

第一个陈述的结果:

RangeNum    MinNum  MaxNum
0           6       8
1           150     152
2           153     2001
3           2002    10010

第二个陈述的结果:

RangeNum    MinNum  MaxNum
0           1       5
1           10      102
2           103     5002
3           5003    5010

分割功能(供参考,但不是这个问题的重点):

-- create split string function
CREATE  FUNCTION [tempSplitStringToInts] ( @SourceString VARCHAR(MAX) , @delimeter VARCHAR(10))
RETURNS @IntList TABLE
   (
     id INT
   )
AS
BEGIN
IF RIGHT(@SourceString, LEN(@delimeter))<> @delimeter
    BEGIN
        SELECT @SourceString = @SourceString + @delimeter
    END

DECLARE @LocalStr VARCHAR(MAX)
DECLARE @start INT
DECLARE @end INT
SELECT @start = 1
SELECT @end =  CHARINDEX ( @delimeter , @SourceString , @start ) 

WHILE @end > 0
    BEGIN 
        SELECT @LocalStr = SUBSTRING ( @SourceString , @start , @end - @start ) 
        IF LTRIM(RTRIM(@LocalStr)) <> '' 
            BEGIN
                INSERT @IntList (id) VALUES (CAST(@LocalStr AS INT))
            END
        SELECT @start = @end + LEN(@delimeter)
        SELECT @end = CHARINDEX ( @delimeter , @SourceString , @start ) 
    END
   RETURN
END
GO
  

正如我所说的那样有效,但对于非常大的表来说它很慢。有没有   写tempFnGetIdRanges函数的更好方法是什么?原生的东西   SQL?如果相关,我正在使用MS SQL 2012

1 个答案:

答案 0 :(得分:1)

不确定你的GetRanges函数试图做什么,但你绝对不需要循环。将HasDate作为1传递时,此函数返回与您的值相同的值。

create function GetRanges
(
    @NumGroups int
) returns table as return

    with MyGroups as
    (
        select NTILE(@NumGroups) over(order by t.id) as GroupNum
            , t.id
        from tempnums t
    )

    select GroupNum
        , MIN(id) as MinNum
        , MAX(id) as MaxNum
    from MyGroups
    group by GroupNum

- 编辑 -

现在我看到你发布了两组样本数据,我理解了这个问题。

以下是如何调整此值以在某些日期中容纳NULL或NOT NULL。

alter function GetRanges
(
    @NumGroups int
    , @HasDate bit
) returns table as return

    with MyGroups as
    (
        select NTILE(@NumGroups) over(order by t.id) as GroupNum
            , t.id
        from tempnums t
        where
        (
            @HasDate = 1
            AND
            t.somedate is not null
        )
        OR
        (
            @HasDate = 0
            AND
            t.somedate is null
        )
    )

    select GroupNum
        , MIN(id) as MinNum
        , MAX(id) as MaxNum
    from MyGroups
    group by GroupNum

我看到的问题是你只有14行有NULL,所以不确定为什么你想要的输出是你的方式。使用NTILE会对样本数据产生稍微不同的结果,因为NTILE将不均匀的行放入组中的方式。