SQL:生成N个随机非负整数,累加到提供的总M

时间:2017-01-13 18:20:25

标签: mysql sql postgresql amazon-redshift

生成随机整数累加到给定总数的问题不断出现在不同的编程语言中,并且有RJavaPython的SO解决方案。

这个问题寻求一个"香草" SQL解决方案仅限于具有可选公用表表达式[CTE]的单个SELECT语句。

输入是1x2表格inputs:包含MNint类型的单行。结果应该是一个Nx1表,其中一列i用于加起来为M的整数。

3 个答案:

答案 0 :(得分:3)

我提出以下查询(PostgreSQL):

WITH ZeroToOne (m, n, y) AS (
  SELECT m, n, random()
  FROM inputs
  CROSS JOIN generate_series(1, n)
), SumToM (m, n, y, x) AS (
  SELECT m, n, y, y * m / sum(y) OVER (PARTITION BY m, n)
  FROM zerotoone
), MissingToM (m, n, l) AS (
  SELECT m, n, m - sum(floor(x))
  FROM sumtom
  GROUP BY m, n
)
SELECT m, n, y, x, l,
  CASE
    WHEN row_number() OVER (PARTITION BY m, n ORDER BY x - floor(x) DESC) > l
    THEN floor(x)
    ELSE ceil(x)
  END AS v
FROM missingtom
NATURAL JOIN sumtom;

唯一有趣的值是m,n和v;为了解释的目的,我留下了其他值。

我将使用以下输入案例作为运行示例逐步执行查询:

SELECT * FROM inputs;
 m  | n 
----+---
 20 | 4
 30 | 4
 42 | 3
(3 rows)

第一个CTE(ZeroToOne)为每个输入案例计算[0,1]范围内的n个随机值,并调用这些值y

 m  | n |          y          
----+---+---------------------
 20 | 4 |   0.374425032641739 
 20 | 4 |   0.644279096741229 
 20 | 4 |   0.626386553514749 
 20 | 4 |   0.320786282420158 
 30 | 4 |   0.848764919675887 
 30 | 4 |   0.268079651053995 
 30 | 4 |   0.250213726423681 
 30 | 4 |   0.497460773680359 
 42 | 3 |   0.571454062592238 
 42 | 3 | 0.00338772451505065 
 42 | 3 |   0.139226260595024 

第二个CTE(SumToM)将每个y值乘以m,并将结果除以输入大小写的值之和。因此,总结输入对(m,n)的所有x得出m

 m  | n |          y          |         x         
----+---+---------------------+-------------------
 20 | 4 |   0.374425032641739 |  3.80924177094873 
 20 | 4 |   0.644279096741229 |  6.55462277759638 
 20 | 4 |   0.626386553514749 |  6.37259161753762 
 20 | 4 |   0.320786282420158 |  3.26354383391728 
 30 | 4 |   0.848764919675887 |  13.6565766414436 
 30 | 4 |   0.268079651053995 |   4.3133855037604 
 30 | 4 |   0.250213726423681 |  4.02592384820881 
 30 | 4 |   0.497460773680359 |  8.00411400658722 
 42 | 3 |   0.571454062592238 |  33.6117414945302 
 42 | 3 | 0.00338772451505065 | 0.199258922297338 
 42 | 3 |   0.139226260595024 |  8.18899958317244 

很明显,m大于x值的整数部分之和。还很容易看出两个和之间的差异(x值的总和和x值的整数部分的总和)小于n。因此,现在的想法是计算同源数量必须四舍五入,有多少必须四舍五入。第三个CTE(MissingToM)的l值是要舍入的值的数量:

 m  | n | l 
----+---+---
 20 | 4 | 2 
 30 | 4 | 1 
 42 | 3 | 1 

为确保数字的分布保持一致,我们将最终小数部分的数字与最终查询进行对比:

 m  | n |          y          |         x         | l | v  
----+---+---------------------+-------------------+---+----
 20 | 4 |   0.374425032641739 |  3.80924177094873 | 2 |  4
 20 | 4 |   0.644279096741229 |  6.55462277759638 | 2 |  7
 20 | 4 |   0.626386553514749 |  6.37259161753762 | 2 |  6
 20 | 4 |   0.320786282420158 |  3.26354383391728 | 2 |  3
 30 | 4 |   0.848764919675887 |  13.6565766414436 | 1 | 14
 30 | 4 |   0.268079651053995 |   4.3133855037604 | 1 |  4
 30 | 4 |   0.250213726423681 |  4.02592384820881 | 1 |  4
 30 | 4 |   0.497460773680359 |  8.00411400658722 | 1 |  8
 42 | 3 |   0.571454062592238 |  33.6117414945302 | 1 | 34
 42 | 3 | 0.00338772451505065 | 0.199258922297338 | 1 |  0
 42 | 3 |   0.139226260595024 |  8.18899958317244 | 1 |  8

如果在输入表中多次出现相同的配置(m,n),查询将失败,我会在其上添加一个主键约束:

ALTER TABLE inputs ADD PRIMARY KEY (m, n);

答案 1 :(得分:2)

这似乎是这样做的(Postgres):

with recursive inputs (n,m) as (
  values (10,100)
), worker (i, total, rn) as (

   select val, val as total, 1 as rn
   from ( 
     select floor(random() * (m/n - 1) + 1)
     from inputs
   ) as x (val)

   union all

   select c.val, p.total + c.val, p.rn + 1
   from worker p
     join lateral (
       select floor(random() * (i.m - p.total - 1) + 1)
       from inputs i
     ) c (val) on p.rn < (select n from inputs)
)
select *
from worker
order by rn;

然而,这可能被认为是作弊,因为大多数时候,在6或7行之后(有时更早,有时更晚)已经达到了值的总和(上例中为100)。这意味着&#34;随机&#34;最后的数字不再是随机的。

其中一个好结果是:

  i | total | rn
----+-------+---
  3 |     3 |  1
  1 |     4 |  2
 40 |    44 |  3
 33 |    77 |  4
 11 |    88 |  5
  2 |    90 |  6
  4 |    94 |  7
  3 |    97 |  8
  2 |    99 |  9
  1 |   100 | 10

但有时它和以下一样糟糕:

  i | total | rn
----+-------+---
  7 |     7 |  1
 59 |    66 |  2
 23 |    89 |  3
 10 |    99 |  4
  1 |   100 |  5
  0 |   100 |  6
  0 |   100 |  7
  0 |   100 |  8
  0 |   100 |  9
  0 |   100 | 10

但我没有看到任何要求随机值必须是唯一的。

在线示例:http://rextester.com/VRBV22166

答案 2 :(得分:-1)

create table inputs (m int,n int);
insert into inputs (m,n) values (100,10);
select      i-lag(i,1,0) over (order by i)  as i

from       (select      *
            from       (select      i
                        from        generate_series (1,(select m from inputs)) as gs(i)
                        order by    random()
                        limit       (select n from inputs)-1
                        ) t

            union all

            select      100
            ) t

 order by   i

示例结果

+----+
| i  |
+----+
| 1  |
+----+
| 1  |
+----+
| 2  |
+----+
| 3  |
+----+
| 4  |
+----+
| 12 |
+----+
| 13 |
+----+
| 16 |
+----+
| 18 |
+----+
| 30 |
+----+

访问非常有限,所以只是想法 -

  • 生成数字1..n-1

  • 随机排序

  • 选择第一个m-1号码 UNION ALL n

  • 订购数字

  • 计算每2个以下数字之间的距离(LAG - 第一个数字,使用0作为前一个数字)