识别重复的Xml节点

时间:2014-08-29 14:17:54

标签: sql sql-server tsql hashcode

我有一组表(有几个一对多的关系),形成一个单位"单位"。我需要确保清除重复项,但确定重复项需要考虑所有数据。

更糟糕的是,有问题的数据库仍处于Sql 2000兼容模式,因此无法使用任何新功能。

Create Table UnitType
(
  Id int IDENTITY Primary Key,
  Action int not null,
  TriggerType varchar(25) not null
)

Create Table Unit
(
  Id int IDENTITY Primary Key,
  TypeId int Not Null,
  Message varchar(100),
  Constraint FK_Unit_Type Foreign Key (TypeId) References UnitType(Id)
 )

Create Table Item
(
  Id int IDENTITY Primary Key,
  QuestionId int not null,
  Sequence int not null
)

Create Table UnitCondition
(
  Id int IDENTITY Primary Key,
  UnitId int not null,
  Value varchar(10),
  ItemId int not null
  Constraint FK_UnitCondition_Unit Foreign Key (UnitId) References Unit(Id),
  Constraint FK_UnitCondition_Item Foreign Key (ItemId) References Item(Id)
)

Insert into Item (QuestionId, Sequence)
Values (1, 1),
(1, 2)

Insert into UnitType(Action, TriggerType)
Values (1, 'Changed')

Insert into Unit (TypeId, Message)
Values (1, 'Hello World'),
(1, 'Hello World')

Insert into UnitCondition(UnitId, Value, ItemId)
Values (1, 'Test', 1),
(1, 'Hello', 2),
(2, 'Test', 1),
(2, 'Hello', 2)

我创建了SqlFiddle,展示了此问题的简单形式。

单元被视为与单元上的所有(非Id)字段重复,并且单元组合上的所有条件在每个细节中都完全匹配。考虑到它像Xml - 如果没有其他Unit节点是精确的字符串副本,则Unit节点(包含单元信息和条件子集合)是唯一的

Select
  Action, 
  TriggerType,
  U.TypeId,
  U.Message,
  (
      Select C.Value, C.ItemId, I.QuestionId, I.Sequence
      From UnitCondition C
        Inner Join Item I on C.ItemId = I.Id
      Where C.UnitId = U.Id
      For XML RAW('Condition')
  ) as Conditions
from UnitType T
  Inner Join Unit U on T.Id = U.TypeId
For XML RAW ('Unit'), ELEMENTS

但我遇到的问题是,我似乎无法让每个单元的XML显示为新记录,而且我不确定如何比较单位节点以查找重复项。

如何运行此查询以确定集合中是否存在重复的Xml Unit节点?

3 个答案:

答案 0 :(得分:0)

如果要确定记录是否重复,则无需将所有值组合成一个字符串。您可以使用ROW_NUMBER函数执行此操作:

SELECT  
  Action, 
  TriggerType,
  U.Id,
  U.TypeId,
  U.Message,
  C.Value,
  I.QuestionId,
  I.Sequence,
  ROW_NUMBER () OVER (PARTITION BY <LIST OF FIELD THAT SHOULD BE UNIQUE> 
                      ORDER BY <LIST OF FIELDS>) as DupeNumber
FROM UnitType T
  Inner Join Unit U on T.Id = U.TypeId
  Inner Join UnitCondition C on U.Id = C.UnitId
  Inner Join Item I on C.ItemId = I.Id;

如果DupeNumber大于1,则记录id重复。

答案 1 :(得分:0)

试试这个 这会发现对不是唯一的 如何建立到你的最终答案 - 不确定 - 但可能是一个开始

select u1.id, u2.id 
  from unit as u1 
  join unit as u2 
    on ui.ID < u2.id 
  join UnitCondition uc1 
    on uc1.unitID = u1.ID 
  full outer join uc2
    on uc2.unitID = u2.ID  
   and uc2.itemID = uc1.itemID 
 where uc2.itemID is null or uc1.itemID is null 

答案 2 :(得分:0)

所以,我设法弄清楚我需要做什么。虽然它有点笨拙。

首先,您需要将Xml Select语句包装在Unit表中的另一个select中,以确保我们最终得到的xml仅代表该单元。

Select
Id,
(
  Select
    Action, 
    TriggerType,
    IU.TypeId,
    IU.Message,
    (
        Select C.Value, I.QuestionId, I.Sequence
        From UnitCondition C
          Inner Join Item I on C.ItemId = I.Id
        Where C.UnitId = IU.Id
        Order by C.Value, I.QuestionId, I.Sequence
        For XML RAW('Condition'), TYPE
    ) as Conditions
  from UnitType T
    Inner Join Unit IU on T.Id = IU.TypeId
  WHERE IU.Id = U.Id
  For XML RAW ('Unit')
)
From Unit U

然后,您可以将其包装在另一个选择中,按内容对xml进行分组。

Select content, count(*) as cnt
From
  (
    Select
      Id,
      (
        Select
          Action, 
          TriggerType,
          IU.TypeId,
          IU.Message,
          (
              Select C.Value, C.ItemId, I.QuestionId, I.Sequence
              From UnitCondition C
                Inner Join Item I on C.ItemId = I.Id
              Where C.UnitId = IU.Id
              Order by C.Value, I.QuestionId, I.Sequence
              For XML RAW('Condition'), TYPE
          ) as Conditions
        from UnitType T
          Inner Join Unit IU on T.Id = IU.TypeId
        WHERE IU.Id = U.Id
        For XML RAW ('Unit')
      ) as content
    From Unit U
  ) as data
group by content
having count(*) > 1

这将允许您将整个内容相同的整个单元分组。

值得注意的是,要测试&#34; uniqueness&#34;,您需要保证内部Xml选择上的数据始终相同。为此,您应对相关数据(即xml中的数据)应用排序以确保一致性。你申请的顺序并不重要,只要两个相同的集合以相同的顺序输出。