我需要从一个表中选择10个随机元素。我知道如何做到这一点,否则这个问题在SO上已被回答了一百万次。我的问题是随机化不够好
我已经坐上了测试用例,显示了我的问题:
DECLARE @Random TABLE
(
Id int,
[Count] int
)
DECLARE @TestData TABLE
(
Id int
)
declare @runs int = 0;
WHILE (@runs <=800)
begin
insert into @TestData values(@runs)
set @runs = @runs +1
end;
set @runs = 0
WHILE (@runs <=100)
begin
MERGE @Random AS target
-- USING (SELECT ID FROM @TestData where 0.01 >= CAST(CHECKSUM(NEWID(), id) & 0x7fffffff AS float) / CAST (0x7fffffff AS int) )
-- USING (SELECT top 10 ID FROM @TestData order by newid())
USING (SELECT top 10 ID FROM @TestData order by abs(checksum(newid())) % 100)
AS SOURCE
ON (target.id = source.id)
WHEN MATCHED THEN
UPDATE SET Target.[Count] = Target.[Count] + 1
WHEN NOT MATCHED THEN
INSERT (ID, [Count]) VALUES (source.ID, 1);
set @runs = @runs +1
end
select [count], count(*) "count(*)" from @Random group by [count] order by 1 desc
正如你所看到的,我已经尝试了几种随机化的方法。但每当我最终得到这样的结果时:
简而言之,如何从表格中选择真正的随机元素?
范围:SQL Server 2017,因此每种语言功能都可以接受
答案 0 :(得分:3)
问题在于您的输出查询我相信,虽然这个答案并不验证随机性,但它应该表明它非常随机。
首先,如果您可以提供帮助,请不要使用COUNT
等关键字作为列名。这就是你输出混淆的原因。
以10000次运行运行此样本,您应该得到一个随机集或结果,但我没有声称它是完全随机的:
DECLARE @Random TABLE
(
Id INT ,
Occurences INT
);
DECLARE @TestData TABLE
(
Id INT
);
DECLARE @runs INT = 0;
WHILE ( @runs <= 800 )
BEGIN
INSERT INTO @TestData
VALUES ( @runs );
SET @runs = @runs + 1;
END;
SET @runs = 0;
WHILE ( @runs <= 10000 )
BEGIN
MERGE @Random AS target
USING ( SELECT TOP 10 Id
FROM @TestData
ORDER BY ABS(CHECKSUM(NEWID())) % 100 ) AS SOURCE
ON ( target.Id = SOURCE.Id )
WHEN MATCHED THEN
UPDATE SET target.Occurences = target.Occurences + 1
WHEN NOT MATCHED THEN INSERT ( Id ,
Occurences )
VALUES ( SOURCE.Id, 1 );
SET @runs = @runs + 1;
END;
SELECT Id ,
Occurences
FROM @Random
ORDER BY Id;
注意:这可以帮助您进一步调查,但不能证明随机性。应该进行进一步的测试。