目标:
x
个未分配的ID。这是针对一些非常具体的事情,虽然我知道有不同的方法可以做到这一点,但我想知道这个特定实现中是否有解决方案。
我有一些部分有用的东西,但想知道函数中的缺陷在哪里。
这是表格:
CREATE SEQUENCE accounts_seq MINVALUE 700000000001 NO MAXVALUE;
CREATE TABLE accounts (
id BIGINT PRIMARY KEY default nextval('accounts_seq'),
client VARCHAR(25), UNIQUE(id, client)
);
此函数gen_account_ids
只是一次性设置,用于预先填充固定行数的表,所有行都标记为未分配。
/*
This function will insert new rows into the accounts table with ids being
generated by a sequence, and client being NULL. A NULL client indicates
the account has not yet been assigned.
*/
CREATE OR REPLACE FUNCTION gen_account_ids(bigint)
RETURNS INT AS $gen_account_ids$
DECLARE
-- count is the number of new accounts you want generated
count alias for $1;
-- rowcount is returned as the number of rows inserted
rowcount int;
BEGIN
INSERT INTO accounts(client) SELECT NULL FROM generate_series(1, count);
GET DIAGNOSTICS rowcount = ROW_COUNT;
RETURN rowcount;
END;
$gen_account_ids$ LANGUAGE plpgsql;
所以,我使用它来预先填充表格,比如1000条记录:
SELECT gen_account_ids(1000);
下一个函数assign
用于随机选择未分配的 id(未分配的意味着client
列为空),并使用客户端值更新它,以便它被分配。它返回受影响的行数。
它有时会 ,但我相信会发生冲突 - 这就是我为DISTINCT
尝试的原因,但它通常会返回少于所需行数的原因。例如,如果我select assign(100, 'foo');
它可能会返回95行而不是所需的100行。
如何修改它以使其始终返回所需的精确行?
/*
This will assign ids to a client randomly
@param int is the number of account numbers to generate
@param varchar(10) is a string descriptor for the client
@returns the number of rows affected -- should be the same as the input int
Call it like this: `SELECT * FROM assign(100, 'FOO')`
*/
CREATE OR REPLACE FUNCTION assign(INT, VARCHAR(10))
RETURNS INT AS $$
DECLARE
total ALIAS FOR $1;
clientname ALIAS FOR $2;
rowcount int;
BEGIN
UPDATE accounts SET client = clientname WHERE id IN (
SELECT DISTINCT trunc(random() * (
(SELECT max(id) FROM accounts WHERE client IS NULL) -
(SELECT min(id) FROM accounts WHERE client IS NULL)) +
(SELECT min(id) FROM accounts WHERE client IS NULL)) FROM generate_series(1, total));
GET DIAGNOSTICS rowcount = ROW_COUNT;
RETURN rowcount;
END;
$$ LANGUAGE plpgsql;
这基于this松散地基于SELECT trunc(random() * (100 - 1) + 1) FROM generate_series(1,5);
,您可以执行i==read/2-1
之类的操作,这将选择1到100之间的5个随机数。
我的目标是做一些类似的事情,我在最小和最大未分配行之间选择一个随机ID,并将其标记为更新。
答案 0 :(得分:2)
这不是最好的答案b / c它确实涉及全表扫描,但在我的情况下,我不关心性能,它的工作原理。这是基于@ CraigRinger对博客文章getting random tuples
的引用我一般都对听到其他(也许是更好的)解决方案感兴趣 - 并且特别好奇为什么原始解决方案不足以及@klin还设计了什么。
所以,这是我的强力随机订单解决方案:
-- generate a million unassigned rows with null client column
insert into accounts(client) select null from generate_series(1, 1000000);
-- assign 1000 random rows to client 'foo'
update accounts set client = 'foo' where id in
(select id from accounts where client is null order by random() limit 1000);
答案 1 :(得分:1)
由于行的随机子集ids
不是连续的,因此请选择随机row_number()
而不是随机id
。
with nulls as ( -- base query
select id
from accounts
where client is null
),
randoms as ( -- calculate random int in range 1..count(nulls.*)
select trunc(random()* (count(*) - 1) + 1)::int random_value
from nulls
),
row_numbers as ( -- add row numbers to nulls
select id, row_number() over (order by id) rn
from nulls
)
select id
from row_numbers, randoms
where rn = random_value; -- random row number
此处不需要函数,但如果需要,您可以轻松地将查询放在函数体中。
此查询使用null client
更新5个随机行。
update accounts
set client = 'new value' -- <-- clientname
where id in (
with nulls as ( -- base query
select id
from accounts
where client is null
),
randoms as ( -- calculate random int in range 1..count(nulls.*)
select i, trunc(random()* (count(*) - 1) + 1)::int random_value
from nulls
cross join generate_series(1, 5) i -- <-- total
group by 1
),
row_numbers as ( -- add row numbers to nulls in order by id
select id, row_number() over (order by id) rn
from nulls
)
select id
from row_numbers, randoms
where rn = random_value -- random row number
)
但是,由于
,因此无法确定查询将准确更新5行select trunc(random()* (max_value - 1) + 1)::int
from generate_series(1, n)
不是生成n
个不同随机值的正确方法。重复概率随商n / max_value
而增加。