选择随机元素 - SQL Server

时间:2018-01-09 11:45:18

标签: sql-server random sql-server-2012

我需要从一个表中选择10个随机元素。我知道如何做到这一点,否则这个问题在SO上已被回答了一百万次。我的问题是随机化不够好

我已经坐上了测试用例,显示了我的问题:

DECLARE @Random TABLE
(
  Id int, 
  [Count] int  
)

DECLARE @TestData TABLE
(
  Id int
)

declare @runs int = 0;

 WHILE (@runs <=800)
 begin
   insert into @TestData values(@runs)
   set @runs = @runs +1 
 end;

 set @runs = 0 

 WHILE (@runs <=100)
 begin
     MERGE @Random AS target  
--      USING (SELECT ID FROM @TestData  where 0.01 >= CAST(CHECKSUM(NEWID(), id) & 0x7fffffff AS float) / CAST (0x7fffffff AS int) ) 
--      USING (SELECT top 10 ID FROM @TestData order by newid()) 
        USING (SELECT top 10 ID FROM @TestData order by abs(checksum(newid())) % 100)

AS SOURCE
        ON (target.id = source.id)  
        WHEN MATCHED THEN               
            UPDATE SET Target.[Count]  = Target.[Count] + 1  
        WHEN NOT MATCHED THEN  
            INSERT (ID, [Count])  VALUES (source.ID, 1);
    set @runs = @runs +1 
end

 select [count], count(*) "count(*)" from @Random group by [count] order by 1 desc 

正如你所看到的,我已经尝试了几种随机化的方法。但每当我最终得到这样的结果时:

enter image description here

简而言之,如何从表格中选择真正的随机元素?

范围:SQL Server 2017,因此每种语言功能都可以接受

1 个答案:

答案 0 :(得分:3)

问题在于您的输出查询我相信,虽然这个答案并不验证随机性,但它应该表明它非常随机。

首先,如果您可以提供帮助,请不要使用COUNT等关键字作为列名。这就是你输出混淆的原因。

以10000次运行运行此样本,您应该得到一个随机集或结果,但我没有声称它是完全随机的:

DECLARE @Random TABLE
    (
        Id INT ,
        Occurences INT
    );

DECLARE @TestData TABLE
    (
        Id INT
    );

DECLARE @runs INT = 0;

WHILE ( @runs <= 800 )
    BEGIN
        INSERT INTO @TestData
        VALUES ( @runs );
        SET @runs = @runs + 1;
    END;

SET @runs = 0;

WHILE ( @runs <= 10000 )
    BEGIN
        MERGE @Random AS target
        USING (   SELECT   TOP 10 Id
                  FROM     @TestData
                  ORDER BY ABS(CHECKSUM(NEWID())) % 100 ) AS SOURCE
        ON ( target.Id = SOURCE.Id )
        WHEN MATCHED THEN
            UPDATE SET target.Occurences = target.Occurences + 1
        WHEN NOT MATCHED THEN INSERT ( Id ,
                                       Occurences )
                              VALUES ( SOURCE.Id, 1 );
        SET @runs = @runs + 1;
    END;

SELECT   Id ,
         Occurences
FROM     @Random
ORDER BY Id;

注意:这可以帮助您进一步调查,但不能证明随机性。应该进行进一步的测试。