我正在尝试创建一个通用的直方图函数来获取我的数据库中任意表的直方图数据。
我的数据库有很多表,每个表的大小几乎都是几个。只有部分列是数字的。
我开始尝试传入用户定义的表值参数。我的用户定义函数的签名看起来像这样:
CREATE TYPE dbo.numArray AS TABLE (
number real NOT NULL
);
CREATE FUNCTION dbo.fn_numericHistogram (
@values dbo.numArray READONLY,
@numOfBreaks int = 10,
@rangeMin float = NULL,
@rangeMax float = NULL
)
虽然有效,但它不符合我的性能要求,因为我必须首先将现有的数字列插入到我的用户定义表中。这需要很长时间。这个长时间运行的插入发生在调用存储过程中,看起来像这样:
DECLARE @TVP AS dbo.numArray;
-- Takes far too long
INSERT INTO @TVP (number)
SELECT myNumericColumn
FROM dbo.SomeLargeTable;
EXEC dbo.fn_numericHistogram @values = @TVP READONLY
为了解决这个问题,我的下一个方法是简单地将表名称作为nvarchar传递,但它涉及大量的字符串操作并且非常难看。
这看起来像是一个合理的解决方法吗?我宁愿选择第一种方法,但我不知道是否可以“通过引用”填充UDT。
由于
*编辑: 假设我可以使用~2gig的数值数据快速填充@values UDT。我的功能看起来像这样:
CREATE TYPE dbo.numArray AS TABLE (
number real NOT NULL
);
CREATE FUNCTION dbo.fn_numericHistogram (
@values dbo.numArray READONLY,
@numOfBreaks int = 10,
@rangeMin float = NULL,
@rangeMax float = NULL
)
RETURNS @output TABLE (
lowerBound float NOT NULL,
upperBound float NOT NULL,
[count] int NOT NULL
)
BEGIN;
DECLARE @intervalSize float;
IF (@rangeMin IS NULL AND @rangeMax IS NULL)
BEGIN
SELECT
@rangeMinOUT = MIN(number),
@rangeMaxOUT = MAX(number)
FROM @values;
END
SET @intervalSize = (@rangeMax - @rangeMin)/@numOfBreaks;
INSERT INTO @output (lowerBound, upperBound, [count])
SELECT @rangeMin+@intervalSize*FLOOR(number/@intervalSize) AS lowerBound,
@rangeMin+@intervalSize*FLOOR(number/@intervalSize)+@intervalSize AS upperBound,
COUNT(*) AS [count]
FROM (
-- Special Case the max values.
SELECT ISNULL(NULLIF(number, @rangeMax), @rangeMax - 0.5 * @intervalSize - @rangeMin AS number
FROM @values
) AS B
GROUP BY FLOOR(number/@intervalSize);
';
RETURN;
END;
GO
否则,我将不得不传入一个表名,函数膨胀到这样:(顺便说一下,我甚至不确定这是否可以作为一个函数...也许我需要而是一个存储过程。)
CREATE FUNCTION dbo.fn_numericHistogram (
@tableName nvarchar(200),
@numericColumnName nvarchar(200),
@numOfBreaks int = 10,
@rangeMin float = NULL,
@rangeMax float = NULL
)
RETURNS @output TABLE (
lowerBound float NOT NULL,
upperBound float NOT NULL,
[count] int NOT NULL
)
BEGIN;
DECLARE @intervalSize float;
IF (@rangeMin IS NULL AND @rangeMax IS NULL)
BEGIN
DECLARE @SQLQuery nvarchar(MAX);
SET @SQLQuery = N'
SELECT
@rangeMinOUT = CONVERT(float, MIN('+@numericColumnName+')),
@rangeMaxOUT = CONVERT(float, MAX('+@numericColumnName+'))
FROM '+@tableName+';
EXEC sp_executesql @SQLQuery, N'rangeMinOUT nvarchar(50) OUTPUT, rangeMaxOUT nvarchar(50) OUTPUT',
@rangeMinOUT=@rangeMin OUTPUT, @rangeMaxOUT=@rangeMax OUTPUT;
END
SET @intervalSize = (@rangeMax - @rangeMin)/@numOfBreaks;
SET @SQLQuery = N'
INSERT INTO @output (lowerBound, upperBound, [count])
SELECT '+CONVERT(nvarchar, @rangeMin)+'+'+CONVERT(nvarchar, @intervalSize)+'*FLOOR(number/'+CONVERT(nvarchar, @intervalSize)+') AS lowerBound,
'+CONVERT(nvarchar, @rangeMin)+'+'+CONVERT(nvarchar, @intervalSize)+'*FLOOR(number/'+CONVERT(nvarchar, @intervalSize)+')+'+CONVERT(nvarchar, @intervalSize)+' AS upperBound,
COUNT(*) AS [count]
FROM (
-- Special Case the max values.
SELECT ISNULL(NULLIF('+@numericColumnName+', '+CONVERT(nvarchar, @rangeMax)+'), '+CONVERT(nvarchar, @rangeMax)+' - 0.5 * '+CONVERT(nvarchar, @intervalSize)+') - '+CONVERT(nvarchar, @rangeMin)+' AS number
FROM '+@tableName+'
) AS B
GROUP BY FLOOR(number/'+CONVERT(nvarchar, @intervalSize)+');'
-- Return the results above
RETURN;
END;
GO