在postgres中混淆数据

时间:2013-11-16 00:28:44

标签: sql postgresql postgresql-9.1

我想在postgres 9.1中的特定列中混淆数据。

例如,我想给所有人一个'随机'的名字和姓氏。

我可以生成一个要使用的名称池:

select name_first into first_names from people order by random() limit 500;
select name_last into last_names from people order by random() limit 500;

这两个查询都在大约400毫秒运行(这对我来说很好,假设它们只需要运行一次!)

使用常规更新语句不起作用 - 这只是每次选择一次,从而为所有人提供相同的名称:

update people
    SET name_last=(SELECT * from last_names order by random() limit 1),
    name_first=(SELECT * from first_names order by random() limit 1)
    where business_id=1;

如何在postgres中为每个人提供随机名称?我真的不想在Ruby on Rails中这样做 - 我认为纯SQL方法会更快。然而,速度并不是太令人担忧,因为我真的整晚都在为这个商业案例做好准备。

1 个答案:

答案 0 :(得分:5)

        -- Invent some data
CREATE TABLE persons
        ( id SERIAL NOT NULL PRIMARY KEY
        , last_name varchar
        );

INSERT INTO persons(last_name)
SELECT 'Name_' || gs::text
FROM generate_series(1,10) gs
        ;

        -- The update
WITH swp AS (
        SELECT last_name AS new_last_name
        , rank() OVER (ORDER BY random() ) AS new_id
        FROM persons
        )
UPDATE persons dst
SET last_name = swp.new_last_name
FROM swp
WHERE swp.new_id = dst.id
        -- redundant condition: avoid updating with same value
AND swp.new_last_name <> dst.last_name
        ;

SELECT * FROM persons
        ;

结果:

 id | last_name 
----+-----------
  1 | Name_6
  2 | Name_4
  3 | Name_8
  4 | Name_2
  5 | Name_1
  6 | Name_10
  7 | Name_5
  8 | Name_7
  9 | Name_3
 10 | Name_9
(10 rows)