如何删除重复记录,或将它们与外键约束完整地合并?

时间:2014-10-02 18:15:57

标签: sql xml tsql

我有一个从具有重复记录的XML文档生成的数据库。我知道如何从主表中删除一条记录,但不知道那些具有外键约束的记录。

我有大量的XML文档,插入它们时没有关心重复项。删除重复项的一种解决方案是删除最低的Primary_Key值(以及所有相关的外键记录)并保持最高值。不过,我不知道该怎么做。

数据库如下所示:

表1:[类型]

+-------------+---------+-----------+
| Primary_Key | Food_ID | Food_Type |
+-------------+---------+-----------+
|   70001     |  12345  |  fruit    |
|   70002     |  12345  |  fruit    |
|   70003     |  12345  |  meat     |
+----^--------+---------+-----------+
     |
     |-----------------|
                       |    
                       | Linked to primary key in the first table
+-------------+--------v--------+-------------+-------------+------------+
| Primary_Key |  Information_ID |   Food_Name | Information |  Comments  | 
+-------------+-----------------+-------------+-------------+------------+
|   0001      |     70001       |   banana    |  buy @ toms | delicious! |
|   0002      |     70002       |   banana    |  buy @ mats | so-so      |
|   0003      |     70003       | decade meat |  buy @ sals | disgusting |
+-------------+-----------------+-------------+-------------+------------+

^ 表2:[food_information]

还有其他几个链接表,它们都具有主表([type])中匹配主键值的外键值。

我的问题基于哪种解决方案可能是最好的:

  1. 如何删除除70003(最高的一个)以外的所有记录?除非 [Food_ID] 出现多次,否则我们无法知道它是否为重复记录。如果它出现多次,我们需要根据Primary_Key和Foreign_Key关系从所有表中删除记录(有10个)。
  2. 如何在插入时更新/合并这些SQL记录以避免再次删除多个?
  3. 我更喜欢#1,因为它阻止我重建数据库,它使插入更容易。

    谢谢!

2 个答案:

答案 0 :(得分:0)

类似......

假设:

create table food (
  primary_key int,
  food_id int,
  food_type varchar(20)
  );

insert into food values (70001,12345,'fruit'); 
insert into food values (70002,12345,'fruit'); 
insert into food values (70003,12345,'meat'); 
insert into food values (70004,11111,'taco'); 

create table info (
  primary_key int,
  info_id int,
  food_name varchar(20)
  );

insert into info values (1,70001,'banana'); 
insert into info values (2,70002,'banana'); 
insert into info values (3,70003,'decade meat'); 
insert into info values (4,70004,'taco taco'); 

然后......

-- yields:   12345   70003 

select  food_id, max(info_id) as max_info_id
from    food 
        join info on food.primary_key=info.info_id
where   food_id in ( 
            select food_id
            from   food 
                   join info on food.primary_key=info.info_id
            group  by food_id
            having count(*)>1);  

然后......类似......这样可以删除那些。可能有更好的方法来写这个...我正在考虑它。

select  *
from    food 
        join info on food.primary_key=info.info_id 
        join ( select  food_id, max(info_id) as max_info_id
               from    food 
                       join info on food.primary_key=info.info_id
               where   food_id in ( 
                           select food_id
                           from   food 
                                  join info on food.primary_key=info.info_id
                           group  by food_id
                           having count(*)>1) 
               ) as dont_delete
            on food.food_id=dont_delete.food_id and 
               info.info_id<max_info_id  

给你:

PRIMARY_KEY FOOD_ID FOOD_TYPE   INFO_ID FOOD_NAME   MAX_INFO_ID
70001       12345   fruit       70001   banana      70003
70002       12345   fruit       70002   banana      70003

所以你可以...... delete from food where primary_key in (select food.primary_key from that_big_query_up_there)delete from info where info_id in (select food.primary_key from that_big_query_up_there)

对于未来的问题,可能会考虑对food ... unique(primary_key,food_id)或其他东西的唯一约束,但如果它是一对一的,为什么不将它们存储在一起......?

答案 1 :(得分:0)

即使[foodID]没有重复,您也会获得最大值(Primary_Key) 它不会被删除

中的where条件不在
delete tableX 
 where tableX.informationID not in ( select max(Primary_Key) 
                                       from [type] 
                                   group by [foodID] )


then just do [type] last


delete [type] 
 where [type].[Primary_Key] not in ( select max(Primary_Key) 
                                       from [type] 
                                      group by [foodID] )

然后在[foodID]上创建唯一约束