我有一些我想要规范化的数据。具体来说,我正常化它,这样我就可以处理规范化的部分,而不必担心重复。我正在做的是:
INSERT INTO new_table (a, b, c)
SELECT DISTINCT a,b,c
FROM old_table;
UPDATE old_table
SET abc_id = new_table.id
FROM new_table
WHERE new_table.a = old_table.a
AND new_table.b = old_table.b
AND new_table.c = old_table.c;
首先,似乎应该有更好的方法来做到这一点。似乎查找不同数据的固有过程可以生成属于它的成员列表。其次,更重要的是,INSERT需要一对,而UPDATE需要 FOREVER (我实际上还没有花费多长时间,因为它还在运行)。我正在使用postgresql。有没有更好的方法(可能在一个查询中)。
答案 0 :(得分:2)
这是我的另一个答案,扩展到三栏:
-- Some test data
CREATE TABLE the_table
( id SERIAL NOT NULL PRIMARY KEY
, name varchar
, a INTEGER
, b varchar
, c varchar
);
INSERT INTO the_table(name, a,b,c) VALUES
( 'Chimpanzee' , 1, 'mammals', 'apes' )
,( 'Urang Utang' , 1, 'mammals', 'apes' )
,( 'Homo Sapiens' , 1, 'mammals', 'apes' )
,( 'Mouse' , 2, 'mammals', 'rodents' )
,( 'Rat' , 2, 'mammals', 'rodents' )
,( 'Cat' , 3, 'mammals', 'felix' )
,( 'Dog' , 3, 'mammals', 'canae' )
;
-- [empty] table to contain the "squeezed out" domain {a,b,c}
CREATE TABLE abc_table
( id SERIAL NOT NULL PRIMARY KEY
, a INTEGER
, b varchar
, c varchar
, UNIQUE (a,b,c)
);
-- The original table needs a "link" to the new table
ALTER TABLE the_table
ADD column abc_id INTEGER -- NOT NULL
REFERENCES abc_table(id)
;
-- FK constraints are helped a lot by a supportive index.
CREATE INDEX abc_table_fk ON the_table (abc_id);
-- Chained query to:
-- * populate the domain table
-- * initialize the FK column in the original table
WITH ins AS (
INSERT INTO abc_table(a,b,c)
SELECT DISTINCT a,b,c
FROM the_table a
RETURNING *
)
UPDATE the_table ani
SET abc_id = ins.id
FROM ins
WHERE ins.a = ani.a
AND ins.b = ani.b
AND ins.c = ani.c
;
-- Now that we have the FK pointing to the new table,
-- we can drop the redundant columns.
ALTER TABLE the_table DROP COLUMN a, DROP COLUMN b, DROP COLUMN c;
SELECT * FROM the_table;
SELECT * FROM abc_table;
-- show it to the world
SELECT a.*
, c.a, c.b, c.c
FROM the_table a
JOIN abc_table c ON c.id = a.abc_id
;
结果:
CREATE TABLE
INSERT 0 7
CREATE TABLE
ALTER TABLE
CREATE INDEX
UPDATE 7
ALTER TABLE
id | name | abc_id
----+--------------+--------
1 | Chimpanzee | 4
2 | Urang Utang | 4
3 | Homo Sapiens | 4
4 | Mouse | 3
5 | Rat | 3
6 | Cat | 1
7 | Dog | 2
(7 rows)
id | a | b | c
----+---+---------+---------
1 | 3 | mammals | felix
2 | 3 | mammals | canae
3 | 2 | mammals | rodents
4 | 1 | mammals | apes
(4 rows)
id | name | abc_id | a | b | c
----+--------------+--------+---+---------+---------
1 | Chimpanzee | 4 | 1 | mammals | apes
2 | Urang Utang | 4 | 1 | mammals | apes
3 | Homo Sapiens | 4 | 1 | mammals | apes
4 | Mouse | 3 | 2 | mammals | rodents
5 | Rat | 3 | 2 | mammals | rodents
6 | Cat | 1 | 3 | mammals | felix
7 | Dog | 2 | 3 | mammals | canae
(7 rows)
编辑:这似乎运作得很好,我讨厌看到我放在那里的投票,所以没用的编辑(CrazyCasta)。
答案 1 :(得分:0)
想出了一个自己做的方法:
BEGIN;
CREATE TEMPORARY TABLE new_table_temp (
LIKE new_table,
old_ids integer[]
)
ON COMMIT DROP;
INSERT INTO new_table_temp (a, b, c, old_ids)
SELECT a, b, c, array_ag(id) AS old_ids
FROM old_table
GROUP BY a, b, c;
INSERT INTO new_table (id, a, b, c)
SELECT id, a, b, c
FROM new_table_temp;
UPDATE old_table
SET abc_id = new_table_temp.id
FROM new_table_temp
WHERE old_table.id = ANY(new_table_temp.old_ids);
COMMIT;
至少这是我想要的。我会更新它是否能够快速运行。 EXPLAIN
似乎是一个明智的计划,所以我很有希望。