生成随机整数累加到给定总数的问题不断出现在不同的编程语言中,并且有R,Java和Python的SO解决方案。
这个问题寻求一个"香草" SQL解决方案仅限于具有可选公用表表达式[CTE]的单个SELECT语句。
输入是1x2表格inputs
:包含M
列N
和int
类型的单行。结果应该是一个Nx1表,其中一列i
用于加起来为M
的整数。
答案 0 :(得分:3)
我提出以下查询(PostgreSQL):
WITH ZeroToOne (m, n, y) AS (
SELECT m, n, random()
FROM inputs
CROSS JOIN generate_series(1, n)
), SumToM (m, n, y, x) AS (
SELECT m, n, y, y * m / sum(y) OVER (PARTITION BY m, n)
FROM zerotoone
), MissingToM (m, n, l) AS (
SELECT m, n, m - sum(floor(x))
FROM sumtom
GROUP BY m, n
)
SELECT m, n, y, x, l,
CASE
WHEN row_number() OVER (PARTITION BY m, n ORDER BY x - floor(x) DESC) > l
THEN floor(x)
ELSE ceil(x)
END AS v
FROM missingtom
NATURAL JOIN sumtom;
唯一有趣的值是m,n和v;为了解释的目的,我留下了其他值。
我将使用以下输入案例作为运行示例逐步执行查询:
SELECT * FROM inputs;
m | n
----+---
20 | 4
30 | 4
42 | 3
(3 rows)
第一个CTE(ZeroToOne)为每个输入案例计算[0,1]范围内的n
个随机值,并调用这些值y
:
m | n | y
----+---+---------------------
20 | 4 | 0.374425032641739
20 | 4 | 0.644279096741229
20 | 4 | 0.626386553514749
20 | 4 | 0.320786282420158
30 | 4 | 0.848764919675887
30 | 4 | 0.268079651053995
30 | 4 | 0.250213726423681
30 | 4 | 0.497460773680359
42 | 3 | 0.571454062592238
42 | 3 | 0.00338772451505065
42 | 3 | 0.139226260595024
第二个CTE(SumToM)将每个y
值乘以m
,并将结果除以输入大小写的值之和。因此,总结输入对(m,n)的所有x
得出m
:
m | n | y | x
----+---+---------------------+-------------------
20 | 4 | 0.374425032641739 | 3.80924177094873
20 | 4 | 0.644279096741229 | 6.55462277759638
20 | 4 | 0.626386553514749 | 6.37259161753762
20 | 4 | 0.320786282420158 | 3.26354383391728
30 | 4 | 0.848764919675887 | 13.6565766414436
30 | 4 | 0.268079651053995 | 4.3133855037604
30 | 4 | 0.250213726423681 | 4.02592384820881
30 | 4 | 0.497460773680359 | 8.00411400658722
42 | 3 | 0.571454062592238 | 33.6117414945302
42 | 3 | 0.00338772451505065 | 0.199258922297338
42 | 3 | 0.139226260595024 | 8.18899958317244
很明显,m大于x
值的整数部分之和。还很容易看出两个和之间的差异(x值的总和和x值的整数部分的总和)小于n
。因此,现在的想法是计算同源数量必须四舍五入,有多少必须四舍五入。第三个CTE(MissingToM)的l值是要舍入的值的数量:
m | n | l
----+---+---
20 | 4 | 2
30 | 4 | 1
42 | 3 | 1
为确保数字的分布保持一致,我们将最终小数部分的数字与最终查询进行对比:
m | n | y | x | l | v
----+---+---------------------+-------------------+---+----
20 | 4 | 0.374425032641739 | 3.80924177094873 | 2 | 4
20 | 4 | 0.644279096741229 | 6.55462277759638 | 2 | 7
20 | 4 | 0.626386553514749 | 6.37259161753762 | 2 | 6
20 | 4 | 0.320786282420158 | 3.26354383391728 | 2 | 3
30 | 4 | 0.848764919675887 | 13.6565766414436 | 1 | 14
30 | 4 | 0.268079651053995 | 4.3133855037604 | 1 | 4
30 | 4 | 0.250213726423681 | 4.02592384820881 | 1 | 4
30 | 4 | 0.497460773680359 | 8.00411400658722 | 1 | 8
42 | 3 | 0.571454062592238 | 33.6117414945302 | 1 | 34
42 | 3 | 0.00338772451505065 | 0.199258922297338 | 1 | 0
42 | 3 | 0.139226260595024 | 8.18899958317244 | 1 | 8
如果在输入表中多次出现相同的配置(m,n),查询将失败,我会在其上添加一个主键约束:
ALTER TABLE inputs ADD PRIMARY KEY (m, n);
答案 1 :(得分:2)
这似乎是这样做的(Postgres):
with recursive inputs (n,m) as (
values (10,100)
), worker (i, total, rn) as (
select val, val as total, 1 as rn
from (
select floor(random() * (m/n - 1) + 1)
from inputs
) as x (val)
union all
select c.val, p.total + c.val, p.rn + 1
from worker p
join lateral (
select floor(random() * (i.m - p.total - 1) + 1)
from inputs i
) c (val) on p.rn < (select n from inputs)
)
select *
from worker
order by rn;
然而,这可能被认为是作弊,因为大多数时候,在6或7行之后(有时更早,有时更晚)已经达到了值的总和(上例中为100)。这意味着&#34;随机&#34;最后的数字不再是随机的。
其中一个好结果是:
i | total | rn
----+-------+---
3 | 3 | 1
1 | 4 | 2
40 | 44 | 3
33 | 77 | 4
11 | 88 | 5
2 | 90 | 6
4 | 94 | 7
3 | 97 | 8
2 | 99 | 9
1 | 100 | 10
但有时它和以下一样糟糕:
i | total | rn
----+-------+---
7 | 7 | 1
59 | 66 | 2
23 | 89 | 3
10 | 99 | 4
1 | 100 | 5
0 | 100 | 6
0 | 100 | 7
0 | 100 | 8
0 | 100 | 9
0 | 100 | 10
但我没有看到任何要求随机值必须是唯一的。
答案 2 :(得分:-1)
create table inputs (m int,n int);
insert into inputs (m,n) values (100,10);
select i-lag(i,1,0) over (order by i) as i
from (select *
from (select i
from generate_series (1,(select m from inputs)) as gs(i)
order by random()
limit (select n from inputs)-1
) t
union all
select 100
) t
order by i
示例结果
+----+
| i |
+----+
| 1 |
+----+
| 1 |
+----+
| 2 |
+----+
| 3 |
+----+
| 4 |
+----+
| 12 |
+----+
| 13 |
+----+
| 16 |
+----+
| 18 |
+----+
| 30 |
+----+
访问非常有限,所以只是想法 -
生成数字1..n-1
随机排序
选择第一个m-1号码 UNION ALL n
订购数字
计算每2个以下数字之间的距离(LAG - 第一个数字,使用0作为前一个数字)