如果在选择列表中多次出现,则rand(校验和(newid())的T-SQL sum()不是随机的?

时间:2013-07-03 22:41:11

标签: sql-server tsql random aggregate-functions

表达式rand(checksum(newid()))通常用于生成随机数。

在生成一些测试数据时,我执行了以下语句:

select rand(checksum(newid())) R1, rand(checksum(newid())) R2
from ftSequence(3)

其中ftSequence(N)是一个表函数,在其行中返回单列N和值1, 2, 3 ... N(与参数N一样多)。运行此结果产生了非常期待的数据:

R1                     R2
---------------------- ----------------------
0,817                  0,9515
0,3043                 0,3947
0,5336                 0,7963

然后有必要在每一栏中找到总和,我做了:

select sum(rand(checksum(newid()))) S1, sum(rand(checksum(newid()))) S2
from ftSequence(3)

令人惊讶的是,我在每一栏中都得到了相同的数字:

S1                     S2
---------------------- ----------------------
1,2276                 1,2276

为什么会这样? avgminmax聚合函数的行为相同。 是查询优化器,还是我错过了一些逻辑?


评论后的更多观察结果。

sum(rand(checksum(newid())))放入CTE或子查询中,如

select
    (select sum(rand(checksum(newid()))) from ftSequence(3)) S1,
    (select sum(rand(checksum(newid()))) from ftSequence(3)) S2

select sum(R1) S1, sum(R2) S2
from (
    select rand(checksum(newid())) R1, rand(checksum(newid())) R2
    from ftSequence(3)
) R

以及像

这样的技巧
select
    sum(rand(checksum(newid()))) S1
    , sum(rand(checksum(newid())) + 0) S2
from ftSequence(3)

工作,导致不同的值

S1                     S2                    
---------------------- ----------------------
0,7349                 1,478                 

很高兴,并且需要生成多个不同avg(rand(checksum(newid()))) from ftSequence(3)的多行,我做了以下

select R.*
from ftSequence(3) S1
    cross join (
        select
            avg(rand(checksum(newid()))) R1,
            avg(rand(checksum(newid())) + 0) R2
        from ftSequence(3)
    ) R

得到以下结果:

R1                     R2
---------------------- ----------------------
0,6464                 0,4501
0,6464                 0,4501
0,6464                 0,4501

此时我无法回答自己,是否是正确的结果,还是值应该是随机的?使所有值随机化的方法是什么?

1 个答案:

答案 0 :(得分:1)

正如我在问题中所述,我需要一组随机测试数据,但不是rand()均匀分布,我需要一套

select avg(rand(checksum(newid()))) from ftSequence(@n)

收敛于高斯分布。

我发现,我可以使用cross join语句而不是cross apply语句,而是对外部范围数据进行额外的无意义检查:

declare @rCnt int, @n int
set @rCnt = 5000000
set @n = 5

select R.*
from ftSequence(@rCnt) S
    cross apply (
        select
            avg(rand(checksum(newid())) + 1e-101) R1,
            avg(rand(checksum(newid())) + 1e-102) R2,
            avg(rand(checksum(newid())) + 1e-103) R3
        from ftSequence(@n)
        where S.N is not NULL
    ) R

但是,我不确定它是否可以被认为是一种可靠的方法。

以下可能是更可靠的选择:

declare @rCnt int, @n int
set @rCnt = 5000000
set @n = 5

create table #Rand (ValNo int, R1 float, R2 float, R3 float)
create clustered index #IX_Rand on #Rand (ValNo)

insert into #Rand
select
    S.N / @n,
    rand(checksum(newid())) R1,
    rand(checksum(newid())) R2,
    rand(checksum(newid())) R3
from ftSequence(@n * @rCnt) S

select AVG(R.R1), AVG(R.R2), AVG(R.R3)
from #Rand R
group by ValNo