我有一个包含3个关键列的表,称为services(service1,service2,service3)和其他值列。我想基于3个关键字段的组合(以任何顺序)从表中删除所有重复记录。例如关键字段的记录'汽车,卡车,自行车'和'自行车,汽车,卡车'尽管字段值的位置是重复的记录。 注意:在评论中编辑了我的答案以获得更详细的陈述。
答案 0 :(得分:0)
听起来好像桌子的设计很差,所以我会考虑完全重构。
但是为了处理它(并且不使用游标),我认为最快的方法是列出每个可能的排列以找到重复项,然后分配行号。
示例:
数字1 6 3有6种排列:
123, 132, 213, 231, 312, 321
同样适合你的自行车' '车' '卡车':
'bike' 'car' 'truck', 'car' 'bike' 'truck', ... etc.
因此,我们希望将表中的数据分区为重复组(基于所有可能的排列),并为分区中的每一行分配行号。
示例表和数据:
CREATE TABLE services
( service1 VARCHAR(10),
service2 VARCHAR(10),
service3 VARCHAR(10)
);
--these first three values duplicate each other. They should end up
--partitioned together in our query
INSERT INTO services VALUES ('bike', 'car', 'truck');
INSERT INTO services VALUES ('truck', 'bike', 'car');
INSERT INTO services VALUES ('car', 'truck', 'bike');
--this fourth value should be in a partition on it's own
INSERT INTO services VALUES ('moped', 'car', 'truck');
运行此查询以查看分区的结果。这实质上是说为所有行创建一个分区,其中三列等于其自身的不同排列:
SELECT s.*,
Row_number() over(PARTITION BY (SELECT DISTINCT 1
FROM services s1
WHERE ( s1.service1 = s.service1
AND s1.service2 = s.service3
AND s1.service3 = s.service2)
OR ( s1.service1 = s.service2
AND s1.service2 = s.service1
AND s1.service3 = s.service3)
OR ( s1.service1 = s.service2
AND s1.service2 = s.service3
AND s1.service3 = s.service1)
OR ( s1.service1 = s.service3
AND s1.service2 = s.service1
AND s1.service3 = s.service2)
OR ( s1.service1 = s.service3
AND s1.service2 = s.service2
AND s1.service3 = s.service1) )
ORDER BY (null)) AS rownumber
FROM services s;
现在您已收到结果,您可以看到您需要删除rownumber
大于1的任何行:
DELETE
FROM (SELECT s.*,
Row_number() over(PARTITION BY (SELECT DISTINCT 1
FROM services s1
WHERE ( s1.service1 = s.service1
AND s1.service2 = s.service3
AND s1.service3 = s.service2)
OR ( s1.service1 = s.service2
AND s1.service2 = s.service1
AND s1.service3 = s.service3)
OR ( s1.service1 = s.service2
AND s1.service2 = s.service3
AND s1.service3 = s.service1)
OR ( s1.service1 = s.service3
AND s1.service2 = s.service1
AND s1.service3 = s.service2)
OR ( s1.service1 = s.service3
AND s1.service2 = s.service2
AND s1.service3 = s.service1) )
ORDER BY (null)) AS rownumber
FROM services s )
WHERE rownumber > 1;
旁注:我是为Oracle写的。我从未使用过Teradata,所以他们可能有不同的分工方式。见http://www.bikinfo.com/HTML/TD/TD_vs_Oracle.html#_Toc_Qualify