我想在postgres 9.1中的特定列中混淆数据。
例如,我想给所有人一个'随机'的名字和姓氏。
我可以生成一个要使用的名称池:
select name_first into first_names from people order by random() limit 500;
select name_last into last_names from people order by random() limit 500;
这两个查询都在大约400毫秒运行(这对我来说很好,假设它们只需要运行一次!)
使用常规更新语句不起作用 - 这只是每次选择一次,从而为所有人提供相同的名称:
update people
SET name_last=(SELECT * from last_names order by random() limit 1),
name_first=(SELECT * from first_names order by random() limit 1)
where business_id=1;
如何在postgres中为每个人提供随机名称?我真的不想在Ruby on Rails中这样做 - 我认为纯SQL方法会更快。然而,速度并不是太令人担忧,因为我真的整晚都在为这个商业案例做好准备。
答案 0 :(得分:5)
-- Invent some data
CREATE TABLE persons
( id SERIAL NOT NULL PRIMARY KEY
, last_name varchar
);
INSERT INTO persons(last_name)
SELECT 'Name_' || gs::text
FROM generate_series(1,10) gs
;
-- The update
WITH swp AS (
SELECT last_name AS new_last_name
, rank() OVER (ORDER BY random() ) AS new_id
FROM persons
)
UPDATE persons dst
SET last_name = swp.new_last_name
FROM swp
WHERE swp.new_id = dst.id
-- redundant condition: avoid updating with same value
AND swp.new_last_name <> dst.last_name
;
SELECT * FROM persons
;
结果:
id | last_name
----+-----------
1 | Name_6
2 | Name_4
3 | Name_8
4 | Name_2
5 | Name_1
6 | Name_10
7 | Name_5
8 | Name_7
9 | Name_3
10 | Name_9
(10 rows)