从XML文件中删除重复的标记

时间:2016-09-07 08:49:55

标签: xml powershell

我的问题是我的XML文件具有两次相同的值,如:

<ns:html xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns="2" release="1">
    <ns:Name>A
        <ns:ID>ANI-2016-05-02T21:01Z</ns:ID>
        <ns:CreationDate>2016-05-02T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>

    <ns:Name>A
        <ns:ID>ANI-2016-05-02T21:01Z</ns:ID>
        <ns:CreationDate>2016-05-02T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>

    <ns:Name>A
        <ns:ID>ANI-2016-08-04T21:01Z</ns:ID>
        <ns:CreationDate>2016-04-08T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>
</ns:html>

我的问题是如何使用XML从XML文件中删除重复值,以便在新的XML文件中获得以下结果。支票可以根据身份证进行。

<ns:html xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns="2" release="1">
    <ns:Name>A
        <ns:ID>ANI-2016-05-02T21:01Z</ns:ID>
        <ns:CreationDate>2016-05-02T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>

    <ns:Name>A
        <ns:ID>ANI-2016-05-02T21:01Z</ns:ID>
        <ns:CreationDate>2016-05-02T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>
</ns:html>

我尝试了以下事项:

首先,我使用了here找到的示例:

## SETUP ENVIRONMENT
# Find "Advanced Monitoring Agent" service and use path to locate files
$gfimaxagent = Get-WmiObject Win32_Service |
               Where-Object { $_.Name -eq 'Advanced Monitoring Agent' }
$gfimaxexe = $gfimaxagent.PathName
$gfimaxpath = Split-Path $gfimaxagent.PathName.Replace([char]34,"") -Parent #"Wordpress syntax highlighter bug
$XmlFile = "C:\Users\Desktop\Test.xml"
$Output = "C:\Users\Desktop\result.xml"

[xml]$XmlContent = Get-Content $XmlFile
$XmlPath = "checks"
$Property = "uid"
$XmlValues = @{}
foreach ($XmlElement in $XmlContent.$XmlPath.ChildNodes)
{
    $ElementValues = "" #"Wordpress syntax highlighter bug
    foreach($XmlValue in $XmlElement.ChildNodes | Sort-Object name)
    {
        $ElementValues = $ElementValues + $XmlValue.Name + $XmlValue.InnerText
    }
    $XmlValues[$XmlElement.$Property] = $ElementValues
}

$XmlDuplicates = @{}
foreach ($XmlValue in $XmlValues.Values)
{
    $Items = @($XmlValues.Keys | Where { $XmlValues[$_] -eq $XmlValue })
    if ($Items.Count -gt 1)
    {
        if (!($XmlDuplicates[$Items[0]])) { $XmlDuplicates[$Items[0]] = $Items }
    }

}

foreach ($XmlDuplicate in $XmlDuplicates.Keys)
{
    for ($i = 1; $i -lt $XmlDuplicates[$XmlDuplicate].Count; $i++)
    {
        $XPath = "//" + $XmlPath + "/*[@" + $Property +"=" + $XmlDuplicates[$XmlDuplicate][$i]+"]"
        $ChildToBeRemoved = $XmlContent.SelectSingleNode($XPath)
        $ChildToBeRemoved.ParentNode.RemoveChild($ChildToBeRemoved)
    }
}

$XmlContent.Save($Output)

问题是甚至在我更改了源代码之后它仍然没有加载我的文件,而是来自网站的原始示例文件。

作为第二个,我也尝试使用以下代码:

[xml]$XmlDocument1 = Get-Content -Path C:\Users\EX27740\Desktop\testdubbel.xml

$softwareVersionsArray = $catalogXML.catalog.software |
    Group-Object name |
    ForEach-Object {$_.Group[0]}

$filename = ' C:\Users\EX27740\Desktop\Resultaat.xml'
$catalogXML.Save($filename)

但是得到一个错误:

  

在第8行:Char:1无法调用空值表达式

1 个答案:

答案 0 :(得分:0)

一种方法是遍历xml节点并删除任何重复项

[xml]$xml = @"
<ns:html xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns="2" release="1">
    <ns:Name>A
        <ns:ID>ANI-2016-05-02T21:01Z</ns:ID>
        <ns:CreationDate>2016-05-02T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>

    <ns:Name>A
        <ns:ID>ANI-2016-05-02T21:01Z</ns:ID>
        <ns:CreationDate>2016-05-02T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>

    <ns:Name>A
        <ns:ID>ANI-2016-08-04T21:01Z</ns:ID>
        <ns:CreationDate>2016-04-08T21:01:40</ns:CreationDate>
        <ns:Subname>A2
            <ns:Total>5000</ns:Total>
            <ns:type>ANI</ns:type>
        </ns:Subname>
    </ns:Name>
</ns:html>
"@

cls
$CreationDates=@()
$xml.html.Name | ForEach-Object {
  if($CreationDates -contains $_.CreationDate) {
    [void]$_.ParentNode.RemoveChild($_)
  } else {
    $CreationDates += $_.CreationDate
  }
}
#$CreationDates
$xml.html.Name