检查没有主键的多个记录中的差异

时间:2012-09-27 21:02:48

标签: tsql duplicates primary-key normalization

我的表结构大致如下(我要遗漏的列数更多)

WEAPON    MUNITION    RANGE

我正在编写一个查询来检查具有不同范围的许多WEAPON-MUNITION配对的单个表。我需要找到WEAPON-MUNITION的每个实例都有不同的范围。允许重复,因为此表中有多个数据集。是的,它违反了规范化,但我没有做到,我只需要查询它。

所以说我有四种不同射程的武器弹药,我需要能够显示它们才能得到纠正。我已经尝试了一些复杂的CTE和真正复杂的自联接,但是当我认为我有结果时,我无法将其绑定回原始表,因为我认为是主键的列在数据集之间有重复!我需要在找到上述记录后显示整个记录。我的结果几乎是我开始的行数的10倍,我无法弄清楚原因。

没有要求DBA允许我为每条记录生成唯一的密钥,我不知道如何实现这一点。

修改 的 使用gregmac的例子我提出了这个查询(通用并省略了一些列和任何专有信息)

WITH range_cte AS 
( 
    SELECT 
        d1.WEAPON
       ,d1.MUNITION
       ,d1.WEAPON
       ,d1.RANGE
       ,d1.ID    --This is NOT a primary key! There are duplicates
    FROM data1 d1 INNER JOIN data2 d2
        ON  d1.WEAPON = d2.WEAPON
        AND d1.MUNITION = d2.MUNITION
        AND d1.RANGE <> d2.RANGE
    GROUP BY 
        d1.WEAPON
       ,d1.MUNITION
       ,d1.WEAPON
       ,d1.RANGE
       ,d1.ID
    ORDER BY 
        d1.WEAPON
       ,d1.MUNITION
)
--Self join the CTE on the original table using the ID (that's not a primary key)
SELECT * FROM range_cte r INNER JOIN data d
    ON r.ID = d.ID

我的想法是为整个表插入一个自动生成的键,还是应该在CTE中包含更多列(如数据集)以形成某种自然键?

3 个答案:

答案 0 :(得分:1)

除非我误解,否则你需要简单地自我加入,并找到武器和弹药相同但行距不同的其他行。

我想出了这个:

SELECT d1.* 
FROM data d1
INNER JOIN data d2 
  ON d1.weapon = d2.weapon 
    AND d1.munition = d2.munition 
    AND d1.range <> d2.range
GROUP BY d1.weapon, d1.munition, d1.range -- eliminate duplicates which are caused by joining both ways 
         ,d1.other1 ,d1.other2
ORDER BY d1.weapon, d1.munition

测试数据:

CREATE TABLE data
(
  WEAPON    varchar(20), 
  MUNITION  varchar(20), 
  RANGE     varchar(20),
  other1    varchar(20),
  other2    varchar(20)
);

INSERT INTO data VALUES ('a', 'x', '1', 'aaa','aaa');
INSERT INTO data VALUES ('a', 'x', '2', 'aaa','bbb');
INSERT INTO data VALUES ('a', 'y', '3', 'aaa','bbb');
INSERT INTO data VALUES ('a', 'z', '4', 'ccc','ddd');
INSERT INTO data VALUES ('b', 'x', '5', 'def','ghh');
INSERT INTO data VALUES ('b', 'z', '6', 'ccc','ddd');
INSERT INTO data VALUES ('b', 'z', '7', 'aaa','aaa');
INSERT INTO data VALUES ('b', 'z', '8', 'aaa','bbb');
INSERT INTO data VALUES ('b', 'z', '9', 'aaa','ccc');

并输出:

WEAPON  MUNITION  RANGE  other1  other2
a       x         1      aaa     aaa 
a       x         2      aaa     bbb 
b       z         6      ccc     ddd 
b       z         7      aaa     aaa 
b       z         8      aaa     bbb 
b       z         9      aaa     ccc 

Sqlfiddle:http://sqlfiddle.com/#!6/65590/3/0

答案 1 :(得分:0)

想法:创建一个包含所有武器弹药对和正确范围的新表。根据新表中的值更新原始表。

UPDATE o
SET o.range = n.range
FROM original AS o
JOIN new AS n ON o.weapon = n.weapon
    AND o.munition = n.munition

答案 2 :(得分:0)

我无法从描述中确定,但也许你想要这样的东西(“mytable”cte只是我尝试为你的表生成测试数据,包括“othercolumn”但可能还有更多):

WITH mytable AS (
    SELECT * 
    FROM (
    VALUES('w1','m1',10, 'o1'), ('w1','m1',20, 'o2'), 
           ('w2','m2',10, 'o3'),
           ('w3','m3',10, 'o4'), ('w3','m3',20,'o5'), ('w3','m3',30,'o6')
    )x(weapon,munition,[range], othercolumn)
), MultiRange AS (
SELECT weapon, munition FROM mytable
GROUP BY weapon,munition
HAVING COUNT(DISTINCT [range])>1
)
SELECT t.*
FROM mytable t
JOIN MultiRange m ON m.weapon = t.weapon AND m.munition = t.munition
ORDER BY weapon, munition, [range]