从XML中取出重复项

时间:2015-04-16 14:30:38

标签: sql sql-server xml tsql sqlxml

我的查询需要一些帮助...我不想让tradeId:s重复并且缺少LegId。你能帮我吗?

我的XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<data>
<value>
    <TradeId>928</TradeId>
    <LegId>1</LegId>
</value>
<value>
    <TradeId>928</TradeId>
    <LegId>2</LegId>
</value>
<value>
    <TradeId>928</TradeId>
    //MISSING LEGID HERE
</value>
<value>
    <TradeId>929</TradeId>
    <LegId>1</LegId>
</value>
<value>
    <TradeId>929</TradeId>
    <LegId>2</LegId>
</value>
<value>
    <TradeId>930</TradeId>
    <LegId>2</LegId>
</value>
</data>

我将此XML声明为变量,然后使用结果填充#temptable:

SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
    WHERE   1 = 1
) AS t



SELECT   *
FROM    #tradeIdDuplicatesToIgnore AS t

这给了我以下输出: output

在这种情况下,我唯一要排的是第3行,标有黄色的那一行(我只需要TradeId列)。这个查询:

SELECT t.strTradeId
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
) AS t
WHERE   1 = 1
--AND       t.LegId IS NULL
GROUP BY  t.strTradeId
HAVING COUNT(t.strTradeId) > 1


SELECT   *
FROM    #tradeIdDuplicatesToIgnore AS t

这给我留下了两行928和929,但我不能得到一个是LegId IS NULL ......

此案例的请求输出:一个使用TradeId 928。

你能帮我解决这个问题吗?

2 个答案:

答案 0 :(得分:4)

您可以使用此查询来获取带有空值的重复项:

;with cte_splitted as (
    select
        e.e.value('TradeId[1]','varchar(50)') as strTradeId,
        e.e.value('LegId[1]','int') as LegId
    from @xmlData.nodes('data/value') as e(e)
)
select
    c.strTradeId
into #tradeIdDuplicatesToIgnore
from cte_splitted as c
group by
    c.strTradeId
having
    count(*) > count(c.LegId) and -- count of all records <> count of not null records
    count(*) > 1 -- there're more than 1 record

<强> sql fiddle demo

答案 1 :(得分:3)

一种可能的方法是,修改FROM子句的xpath,只选择<value>没有孩子<LegId>

data/value[not(LegId)]

参见行动中的xpath:

SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value[not(LegId)]') AS elements(e)
    WHERE   1 = 1
) AS t

SELECT   *
FROM    #tradeIdDuplicatesToIgnore AS t

输出:

enter image description here

更新:

我之前错过了检查重复项的要求。所以这是实现相同的不同方法,但增加了重复检查:

SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
    WHERE   1 = 1
) AS t

SELECT   t.strTradeId
FROM    #tradeIdDuplicatesToIgnore AS t
        INNER JOIN 
        (
            SELECT COUNT(*) 'count', strTradeId
            FROM #tradeIdDuplicatesToIgnore
            GROUP BY strTradeId
        ) As t2 on t2.strTradeId = t.strTradeId
WHERE LegId IS NULL AND t2.count > 1

输出:

enter image description here

更新2:

;with T as (
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
)
SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT T.strTradeId
    FROM T
    GROUP BY T.strTradeId
    HAVING COUNT(*)>1 AND COUNT(*)>COUNT(T.LegId)
) AS t

SELECT * FROM #tradeIdDuplicatesToIgnore