MSSQL在大数据中选择随机

时间:2014-10-27 20:14:46

标签: sql sql-server tsql

我有一个包含超过100万条记录的表,我想从这个表中选择随机行,但不是在所有记录中选择 - 只从匹配特定条件的结果中选择随机行。

性能非常重要,所以我不能使用NEWID订购然后选择第一项。

表结构是这样的:

 ID    BIGINT
 Title NVARCHAR(100)
 Level INT
 Point INT

现在,我写了一个类似的查询:

with 
    tmp_one as
    (
        SELECT
                R.Id as RID 
                FROM    [User] as U
                            Inner Join
                        [Item] as R
                            On  R.UserId = U.Id

                WHERE       ([R].[Level] BETWEEN @MinLevel AND @MaxLevel) 
                        AND ((ABS((BINARY_CHECKSUM(NEWID(),R.Id,NEWID())))% 10000)/100 ) > @RangeOne
    ),
    tmp_two as
    (
        Select  tmp_one.RID as RID
            From    tmp_one
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),RID,NEWID())))% 10000)/100 ) > @RangeTwo
    ),
    tmp_three as
    (
        Select  RID as RID 
            From    tmp_two
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),NEWID())))% 10000)/100 ) < @RangeThree
    )
    Select  top 10 RID
        From    tmp_three

我试图随机选择10项,然后选择其中一项,但我有一个惊人的问题!!!

有时输出按项目级别排序!而且我不想要它(它不是真的随机)。我真的不知道结果是按级别排序的。

请建议一些解决方案,帮助我选择高性能的随机记录,并在高范围的迭代中随机选择不重复。

2 个答案:

答案 0 :(得分:1)

基于MSDN的Selecting Rows Randomly from a Large Table,而不是您避免的那个:

select top 10 * from TableName order by newid()

它暗示了这一点:

select top 10 * from TableName where (abs(cast((binary_checksum(*) * rand()) as int)) % 100) < 10

它只有更小的逻辑读取性能。

答案 1 :(得分:-1)

尝试这样的事情。它将从您的表中随机抓取10行。

这是伪代码,因此您可能需要修复几个列名以匹配您的真实表。

DECLARE @Random int
DECLARE @Result table
(ID BIGINT,
Title varchar(100),
Level int,
Point int)

declare @TotalRows int
set @TotalRows = (select COUNT(*) From [User] U inner join [Item] R on R.UserID = U.ID)

while (select COUNT(*) from @Result)<10
begin
set @Random = (select floor(RAND() * @TotalRows+1))

insert into @Result
select T1.ID, T1.Title, T1.Level, T1.Point from
(select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T1
left outer join (select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T2 on T2.ID = T1.ID
where T2.ID is null


end

select * from @Result

以下是它的工作原理。

Select a random number.   For example 47. 
We want to select the 47th row of the table. 
Select the top 47 rows, call it T1. 
Join it to the top 46 rows called T2. 
The row where T2 is null is the 47th row. 
Insert that into a temporary table. 
Do it until there are 10 rows. 
Done.