使用ORDER BY NEWID()生成随机测试数据,包括重复的行

时间:2019-05-09 19:36:30

标签: sql sql-server tsql

我需要从表中选择随机行作为测试数据。有时候我需要的测试数据行比表中的记录还多。重复是可以的。如何构造我的选择,以便可以得到重复的行?

CREATE TABLE [Northwind].[dbo].[Persons]
(PersonID int, LastName varchar(255))

INSERT INTO [Northwind].[dbo].[Persons] 
VALUES
(1, 'Smith'), 
(2, 'Jones'),
(3, 'Washington')

SELECT TOP 5 *
FROM [Northwind].[dbo].[Persons]  
ORDER BY NEWID()

如何获取Select语句以随机顺序给我5条记录,并重复?目前,它仅以随机顺序返回三个。

我希望能够将此扩展到100行或1000行,或者我需要的行数。

4 个答案:

答案 0 :(得分:3)

使用递归CTE合并足够多的行,以使它们大于所需的行数。然后像以前一样从中进行选择。

declare
    @desired int = 5,
    @actual int = (select count(*) from persons);

with

    persons as (

        select    personId,
                  lastName,
                  batch = 0
        from      Persons

        union all
        select    personId,
                  lastName,
                  batch = batch + 1
        from      persons
        where     (batch + 1) * @actual < @desired

    )

    select    
    top (@desired) personId, lastName
    from           persons
    order by       newid()

答案 1 :(得分:1)

如上所述。您可以改为使用一个理货表,然后获取随机行;

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)))N(N)),
Tally AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2, N N3, N N4) --Repeat for more
SELECT TOP 500 YT.*
FROM Tally T
     CROSS JOIN YourTable YT
ORDER BY NEWID();

答案 2 :(得分:0)

我在考虑如何解决这个问题而无需对所有记录进行排序,尤其是多次排序。

一种方法是生成随机数,并使用这些随机数查找数据中的值:

with n as (
      select rand(checksum(newid())) as r, 1 as n
      union all
      select rand(checksum(newid())) as r, n + 1
      from n
      where n < 10
     ),
     tt as (
      select t.*, lag(tile_end, 1, 0) over (order by tile_end) as tile_start
      from (select t.*, row_number() over (order by newid()) * 1.0 / count(*) over () as tile_end
            from t
           ) t
     )
select tt.*, n.r, (select count(*) from n)
from n left join
     tt
     on n.r >= tt.tile_start and n.r < tt.tile_end;

Here是db <>小提琴。 row_number()不需要使用order by newid()。它可以通过具有索引的键进行排序-从而使该组件更加高效。

对于100多个行,您将需要OPTION (MAXRECURSION 0)

答案 3 :(得分:0)

我添加了一个临时结果表,并循环查询,并将结果推入临时表。

declare  @results table(
SSN varchar(10),
Cusip   varchar(10),
...
EndBillingDate  varchar(10))

DECLARE @cnt INT = 0;

WHILE @cnt < @trades
BEGIN
INSERT INTO @results 
   Select   ...

 set @cnt = @cnt + 10
END

select * from @results