为什么DelimitedSplit8k udf是用SQL Server中的2X(笛卡尔积)编写的?

时间:2015-05-05 10:36:21

标签: sql-server common-table-expression

我问this question关于在sql server中编写快速内联表值函数。

答案中的代码正在运作,但我在问这个部分:

enter image description here

我很清楚他想要创建许多数字(1,1,1,1,1,...),然后将它们转换为连续数字(1,2,3,4,5,6。 ......):

在这部分:

WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b)
,E4(N) AS (SELECT 1 FROM E2 a, E2 b)
SELECT * FROM e4 --10000 rows

他创造了10000行。

这个功能被广泛使用,因此我的问题是:

问题:

为什么他(Jeff Moden)没有使用:

WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b , E1 c , E1 d)

SELECT * FROM E2 -- ALSO 10000 rows !!!

但是选择将其拆分为E2E4

2 个答案:

答案 0 :(得分:3)

虽然我不是Jeff Moden并且不知道他的推理,但我发现他可能只是使用一种已知的数字生成模式,他自己称之为Itzik Ben Gan在{{3}中交叉加入CTE方法}。

模式如下:

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
     E02(N) AS (SELECT 1 FROM E00 a, E00 b),
     E04(N) AS (SELECT 1 FROM E02 a, E02 b),
     E08(N) AS (SELECT 1 FROM E04 a, E04 b),
     ...

为了使该方法适应他的字符串分割功能,他显然发现将初始CTE修改为十行而不是两行更方便,并将交叉连接CTE的数量减少到两个以覆盖8000他解决方案所需的行。

答案 1 :(得分:2)

嘿......刚碰过来,我以为我会回答。

Andriy M回答得恰到好处。它非常模仿Itzik Ben-Gan的原始BASE 2代码,是的,我将它(和其他许多人一样)改为Base 10代码只是为了减少cCTE(Cascading CTE)的数量。我和许多其他人使用的最新代码进一步减少了cCTE的数量。它还使用VALUES运算符来减少大部分代码,尽管这样做没有性能优势。

   WITH  E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))E0(N)) --10 rows
        ,E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)
 SELECT * FROM e4 --10000 rows
;

还有许多其他地方需要这种动态创建序列的需要。有些人需要将序列设置为0,将其他序列设置为1.还需要更大范围的值,并且说实话,我已经厌倦了精心编写类似于上面的代码所以我做了先生。本甘和其他许多人都做到了。我写了一个名为" fnTally"的iTVF。我通常不会使用匈牙利表示法来执行各种功能,但我有两个理由使用" fn"字首。 1)是因为我仍然保持一个物理Tally表,所以功能需要以不同的方式命名; 2)我可以告诉工作中的人"如果你使用了' eff-n'我告诉过你的Tally功能,你不会遇到这个问题"没有它实际上是HR违规。 ; - )

万一有人应该需要这样的东西,这里是我为我的fnTally函数版本编写的代码。在允许它以0或1的性能开始时,还有一点点权衡取舍,但无论如何它都值得额外的灵活性。并且,是的...你可以通过在第二次和最后一次cCTE中进行12次CROSS JOIN来减少其中的cCTE数量。我只是没有走那条路。你可以毫无伤害。

另请注意,我仍然使用SELECT / UNION ALL方法来形成前10个伪行,因为我仍然在2005年与人们做了大量的工作,并且在大约6个月前使用2005自己。代码中包含完整的文档。

 CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
 Purpose:
 Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.

 As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

 Usage:
--===== Syntax example (Returns BIGINT)
 SELECT t.N
   FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;

 Notes:
 1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
    Refer to the following URLs for how it works and introduction for how it replaces certain loops. 
    http://www.sqlservercentral.com/articles/T-SQL/62867/
    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
 2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
    will cause the sequence to start at 1.
 3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
 5. If @MaxN is negative or NULL, a "TOP" error will be returned.
 6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
    that many values, you should consider using a different tool. ;-)
 7. There will be a substantial reduction in performance if "N" is sorted in descending order.  If a descending 
    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT; 
     SELECT @MaxN = 1000;
     SELECT DescendingN = @MaxN-N+1 
       FROM dbo.fnTally(1,@MaxN);

 8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

 Revision History:
 Rev 00 - Unknown     - Jeff Moden 
        - Initial creation with error handling for @MaxN.
 Rev 01 - 09 Feb 2013 - Jeff Moden 
        - Modified to start at 0 or 1.
 Rev 02 - 16 May 2013 - Jeff Moden 
        - Removed error handling for @MaxN because of exceptional cases.
 Rev 03 - 22 Apr 2015 - Jeff Moden
        - Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
        (@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS 
 RETURN WITH
  E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1)                                  --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)      --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c)            --10E12 or 1 Trillion rows                 
            SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
             UNION ALL 
            SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;