如何在SQL中用随机数填充列?我在每一行都得到相同的价值

时间:2011-02-15 11:26:38

标签: sql-server

UPDATE CattleProds
SET SheepTherapy=(ROUND((RAND()* 10000),0))
WHERE SheepTherapy IS NULL

如果我然后选择SELECT,我会看到我的随机数为identical in every row。有关如何生成唯一随机数的任何想法吗?

5 个答案:

答案 0 :(得分:140)

使用rand()而不是newid(),而不是checksum(newid()),这会为结果中的每一行重新计算。通常的方法是使用校验和的模数。请注意abs()可以产生-2,147,483,648并导致UPDATE CattleProds SET SheepTherapy = abs(checksum(NewId()) % 10000) WHERE SheepTherapy IS NULL 上的整数溢出,因此我们需要在校验和返回值上使用模数,然后再将其转换为绝对值。

{{1}}

这会生成0到9999之间的随机数。

答案 1 :(得分:22)

如果您使用的是SQL Server 2008,还可以使用

 CRYPT_GEN_RANDOM(2) % 10000

这似乎有点简单(每行评估一次为newid - 如下所示)

DECLARE @foo TABLE (col1 FLOAT)

INSERT INTO @foo SELECT 1 UNION SELECT 2

UPDATE @foo
SET col1 =  CRYPT_GEN_RANDOM(2) % 10000

SELECT *  FROM @foo

返回(2个随机可能不同的数字)

col1
----------------------
9693
8573

考虑无法解释的downvote是我能想到的唯一正当理由是因为生成的随机数在0-65535之间,不能被10,000整除,否则某些数字会稍微过度表示。解决这个问题的方法是将它包装在一个标量UDF中,该标量抛出超过60,000的任何数字并递归调用以获得替换数字。

CREATE FUNCTION dbo.RandomNumber()
RETURNS INT
AS
  BEGIN
      DECLARE @Result INT

      SET @Result = CRYPT_GEN_RANDOM(2)

      RETURN CASE
               WHEN @Result < 60000
                     OR @@NESTLEVEL = 32 THEN @Result % 10000
               ELSE dbo.RandomNumber()
             END
  END  

答案 2 :(得分:6)

虽然我喜欢使用CHECKSUM,但我觉得更好的方法是使用NEWID(),因为你不需要通过复杂的数学来生成简单的数字。

ROUND( 1000 *RAND(convert(varbinary, newid())), 0)

您可以将1000替换为您要设置为限制的数字,并且您始终可以使用加号来创建范围,假设您想要一个介于100200之间的随机数。 100 + ROUND( 100 *RAND(convert(varbinary, newid())), 0) ,您可以执行以下操作:

UPDATE CattleProds 
SET SheepTherapy= ROUND( 1000 *RAND(convert(varbinary, newid())), 0)
WHERE SheepTherapy IS NULL

将它放在您的查询中:

{{1}}

答案 3 :(得分:1)

我通过生成100,000,000行来测试针对RAND()的2组基于随机化的方法。为了调整字段,输出是0到1之间的浮点数,以模仿RAND()。大多数代码都在测试基础架构,所以我在这里总结一下算法:

-- Try #1 used
(CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
-- Try #2 used
RAND(Checksum(NewId()))
-- and to have a baseline to compare output with I used
RAND() -- this required executing 100000000 separate insert statements

使用CRYPT_GEN_RANDOM显然是最随机的,因为当从一组10 ^ 18个数字中采集10 ^ 8个数字时,只有.000000001%的机会看到甚至1个重复。我们不应该看到任何重复,这没有!这套设备在我的笔记本电脑上生成了44秒。

Cnt     Pct
-----   ----
 1      100.000000  --No duplicates

SQL Server执行时间:    CPU时间= 134795毫秒,经过时间= 39274毫秒。

IF OBJECT_ID('tempdb..#T0') IS NOT NULL DROP TABLE #T0;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 (CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)%500000000000000000+500000000000000000.0)/1000000000000000000 AS Val
  INTO #T0
  FROM L3;

 WITH x AS (
     SELECT Val,COUNT(*) Cnt
      FROM #T0
     GROUP BY Val
)
SELECT x.Cnt,COUNT(*)/(SELECT COUNT(*)/100 FROM #T0) Pct
  FROM X
 GROUP BY x.Cnt;

这种方法的随机性几乎降低了15个数量级,速度不是原来的两倍,只需23秒即可生成100M数字。

Cnt  Pct
---- ----
1    95.450254    -- only 95% unique is absolutely horrible
2    02.222167    -- If this line were the only problem I'd say DON'T USE THIS!
3    00.034582
4    00.000409    -- 409 numbers appeared 4 times
5    00.000006    -- 6 numbers actually appeared 5 times 

SQL Server执行时间:    CPU时间= 77156 ms,经过时间= 24613 ms。

IF OBJECT_ID('tempdb..#T1') IS NOT NULL DROP TABLE #T1;
GO
WITH L0   AS (SELECT c FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c))  -- 2^4  
    ,L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B)    -- 2^8  
    ,L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B)    -- 2^16  
    ,L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B)    -- 2^32  
SELECT TOP 100000000 RAND(Checksum(NewId())) AS Val
  INTO #T1
  FROM L3;

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T1
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T1) Pct
  FROM X
 GROUP BY x.Cnt;

单独使用RAND()对基于集合的生成没有用处,因此生成比较随机性的基线需要花费6个多小时,并且必须重新启动几次才能最终获得正确数量的输出行。似乎随机性还有很多不足之处,尽管它比使用校验和(newid())重新设置每一行更好。

Cnt  Pct
---- ----
1    99.768020
2    00.115840
3    00.000100  -- at least there were comparitively few values returned 3 times

由于重新启动,无法捕获执行时间。

IF OBJECT_ID('tempdb..#T2') IS NOT NULL DROP TABLE #T2;
GO
CREATE TABLE #T2 (Val FLOAT);
GO
SET NOCOUNT ON;
GO
INSERT INTO #T2(Val) VALUES(RAND());
GO 100000000

WITH x AS (
    SELECT Val,COUNT(*) Cnt
     FROM #T2
    GROUP BY Val
)
SELECT x.Cnt,COUNT(*)*1.0/(SELECT COUNT(*)/100 FROM #T2) Pct
  FROM X
 GROUP BY x.Cnt;

答案 4 :(得分:-2)

require_once('db/connect.php');

//rand(1000000 , 9999999);

$products_query = "SELECT id FROM products";
$products_result = mysqli_query($conn, $products_query);
$products_row = mysqli_fetch_array($products_result);
$ids_array = [];

do
{
    array_push($ids_array, $products_row['id']);
}
while($products_row = mysqli_fetch_array($products_result));

/*
echo '<pre>';
print_r($ids_array);
echo '</pre>';
*/
$row_counter = count($ids_array);

for ($i=0; $i < $row_counter; $i++)
{ 
    $current_row = $ids_array[$i];
    $rand = rand(1000000 , 9999999);
    mysqli_query($conn , "UPDATE products SET code='$rand' WHERE id='$current_row'");
}