Question

我想读入XML文档并返回具有唯一节点的XML文档。如果存在一个带有重复元素compoundName的节点，则应删除该父节点。

<scanSegment>
      <index>28</index>
      <GUID>539003de-1379-4a03-94bf-1ede58625ab5</GUID>
      <ionMode>ESI</ionMode>
      <ionPolarity>Positive</ionPolarity>
      <scanType>DynamicMRM</scanType>
      <dataStorage>PeakDetected</dataStorage>
      <threshold>0</threshold>
      <fragmentorMode>Fixed</fragmentorMode>
      <fragmentorRamp />
      <scheduledTime>4.33</scheduledTime>
      <timeWindow>1.2</timeWindow>
      <scheduledSetting>720</scheduledSetting>
      <isTriggeredMRM>false</isTriggeredMRM>
      <numtMRMRepeats>3</numtMRMRepeats>
      <scanElements>
        <scanElement>
          <index>1</index>
          <compoundName>3-keto carbofuran</compoundName>
          <isISTD>false</isISTD>
          <ms1LowMz>236.1</ms1LowMz>
          <ms1Res>Unit</ms1Res>
          <ms2LowMz>208.1</ms2LowMz>
          <ms2Res>Unit</ms2Res>
          <fragmentor>82</fragmentor>
          <deltaEMV>200</deltaEMV>
          <cellAccVoltage>9</cellAccVoltage>
          <collisionEnergy>4</collisionEnergy>
          <isPrimaryMRM>true</isPrimaryMRM>
          <isTriggerMRM>false</isTriggerMRM>
          <triggerEntranceDelayTime>0</triggerEntranceDelayTime>
          <triggerDelayTime>0</triggerDelayTime>
          <triggerWindow>0</triggerWindow>
          <triggerMRMThreshold>0</triggerMRMThreshold>
          <compoundGroup>
          </compoundGroup>
        </scanElement>
      </scanElements>
    </scanSegment>

名为“ compoundName”的元素嵌套在scanElement和scanElements中...我在过滤XML文档以检查元素“ compoundName”是否唯一时遇到了麻烦。

我已经阅读了一些具有LINQ格式的示例，例如

xmlDoc.Descendants("scanSegment").GroupBy().Where().Remove()

我不确定如何填写其余查询。

Answer 1

删除这些重复元素的一种方法是将XSLT样式表应用于XML。 Sample code is described here at Microsoft。我对其进行了修改以满足您的需求。
source.xml是输入文件，trans.xslt是XSLT文件，destination.xml是输出文件。

// Open books.xml as an XPathDocument.
XPathDocument doc = new XPathDocument("source.xml");    
// Create a writer for writing the transformed file.
XmlWriter writer = XmlWriter.Create("destination.xml");
// Create and load the transform with script execution enabled.
XslCompiledTransform transform = new XslCompiledTransform();
XsltSettings settings = new XsltSettings();
settings.EnableScript = true;
transform.Load("trans.xslt", settings, null);    
// Execute the transformation.
transform.Transform(doc, writer);

这是XSLT-1.0文件trans.xslt。您要应用的任务在一个模板中通过表达式scanElement[count(compoundName) > 1]完成。它将丢弃计数超过一个scanElement个孩子的所有compoundName个对象。

因此，从本质上讲，您可以在一行XSLT代码中完成过滤。它带有 identity template （身份模板），该模板复制没有其他模板适用的所有节点。

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>

    <!-- Identity template - this template is applied by default to all nodes and attributes -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template> 

   <xsl:template match="*[count(compoundName) > 1]" />

</xsl:stylesheet>

Answer 2

尝试以下操作：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XDocument doc = XDocument.Load(FILENAME);
            XElement scanElements = doc.Descendants("scanElements").FirstOrDefault();

            List<XElement> uniqueScanElements = scanElements.Elements("scanElement")
                .Select(x => new { compoundName = (string)x.Element("compoundName"), scanElement = x })
                .GroupBy(x => x.compoundName)
                .Select(x => x.FirstOrDefault())
                .Select(x => x.scanElement)
                .ToList();

            scanElements.ReplaceWith(new XElement("scanElements"), uniqueScanElements);
        }
    }
}

如何基于XML中的重复元素删除节点？

2 个答案: