在包含>的表格中100k行,我怎样才能有效地改变特定列的值?
表格定义:
CREATE TABLE person
(
id integer NOT NULL,
first_name character varying,
last_name character varying,
CONSTRAINT person_pkey PRIMARY KEY (id)
)
为了匿名化数据,我必须改变' first_name'的值。列到位(我不允许创建新表)。
我的尝试:
with
first_names as (
select row_number() over (order by random()),
first_name as new_first_name
from person
),
ids as (
select row_number() over (order by random()),
id as ref_id
from person
)
update person
set first_name = new_first_name
from first_names, ids
where id = ref_id;
完成需要几个小时。
有没有一种有效的方法呢?
答案 0 :(得分:5)
postgres的问题是每次更新都意味着delete
+ insert
SELECT
代替UPDATE
查看分析,了解CTE的效果
CREATE TABLE new_table AS
SELECT * ....
DROP oldtable;
Rename new_table to old_table
CREATE index and constrains
很抱歉,这不是一个选择:(
编辑:阅读a_horse_with_no_name
看起来像你需要
with
first_names as (
select row_number() over (order by random()) rn,
first_name as new_first_name
from person
),
ids as (
select row_number() over (order by random()) rn,
id as ref_id
from person
)
update person
set first_name = new_first_name
from first_names
join ids
on first_names.rn = ids.rn
where id = ref_id;
如果您提供ANALYZE / EXPLAIN
结果,那么效果问题会更好。
答案 1 :(得分:4)
这个需要5秒钟在我的笔记本电脑上洗牌500.000行:
with names as (
select id, first_name, last_name,
lead(first_name) over w as first_1,
lag(first_name) over w as first_2
from person
window w as (order by random())
)
update person
set first_name = coalesce(first_1, first_2)
from names
where person.id = names.id;
我们的想法是在随机排序数据后选择“下一个”名称。这与选择随机名称一样好。
有可能并非所有名字都被洗牌,但如果你运行两三次,这应该足够了。
以下是SQLFiddle上的测试设置:http://sqlfiddle.com/#!15/15713/1
右侧的查询检查“随机化”
后是否有任何名字保持不变