Linq:比较2个XML文件并输出到xml的差异

时间:2013-10-31 18:04:28

标签: c# xml linq

我有2个xml文件,当前版本(B)和以前版本(A)

我想根据Id属性检查是否有任何属性发生了变化 如果是这样,我想获取该元素以及已添加到当前文件(B)的任何新元素

所以我会得到任何元素的结果xml文件,包含任何更改和任何新元素

---- Version A
<Books>
    <book id='1' image='C01' name='C# in Depth'/>
    <book id='2' image='C02' name='ASP.NET'/>
    <book id='3' image='C03' name='LINQ in Action '/>
    <book id='4' image='C04' name='Architecting Applications'/>
</Books>
---- Version B
<Books>    
 <book id='1' image='C011' name='C# in Depth'/>
 <book id='2' image='C02' name='ASP.NET 2.0'/>
 <book id='3' image='XXXC03' name='XXXLINQ in Action '/>
 <book id='4' image='C04' name='Architecting Applications'/>
<book id='5' image='C05' name='PowerShell in Action'/>
</Books>

我想返回以下内容

 ---- Results
 <Books>
  <book id='1' image='C011' name='C# in Depth'/>
  <book id='2' image='C02' name='ASP.NET 2.0'/>
  <book id='3' image='XXXC03' name='XXXLINQ in Action '/>
  <book id='5' image='C05' name='PowerShell in Action'/>
</Books>

到目前为止,这是我的代码。我可以根据id而不是任何新的来获得更改,而且我确信有人可以在一个语句中获得全部内容以及属性而无需再次解析。 感谢

 private void LinqCompareXMLFiles() 
        {
            string oldXML = @"<Books>
     <book id='1' image='C01' name='C# in Depth'/>
     <book id='2' image='C02' name='ASP.NET'/>
     <book id='3' image='C03' name='LINQ in Action '/>
     <book id='4' image='C04' name='Architecting Applications'/>

    </Books>";

            string newXML = @"<Books>
     <book id='1' image='C011' name='C# in Depth'/>
     <book id='2' image='C02' name='ASP.NET 2.0'/>
     <book id='3' image='XXXC03' name='XXXLINQ in Action '/>
     <book id='4' image='C04' name='Architecting Applications'/>
    <book id='5' image='C05' name='PowerShell in Action'/>

    </Books>";

            XDocument xmlOld = XDocument.Parse(oldXML);
            XDocument xmlNew = XDocument.Parse(newXML);

            var res = (from b1 in xmlOld.Descendants("book")
                       from b2 in xmlNew.Descendants("book")
                      let issues = from a1 in b1.Attributes()
                                   join a2 in b2.Attributes()
                                     on a1.Name equals a2.Name
                                   select new
                                   {
                                       Id = a1.Parent.FirstAttribute.Value,
                                       Name = a1.Name,
                                       Value1 = a1.Value,
                                       Value2 = a2.Value
                                   }
                      where issues.Any(i => i.Value1 == i.Value2)
                      from issue in issues
                      where issue.Value1 != issue.Value2
                      select issue);
            var reportXmlItems = (from rx in res select new XElement("book", new XAttribute("id", rx.Id))).Distinct(new MyComparer());

            // This isn't excluding the ids that exist in theold book set because they are different elements I guess and I need to exclude based on the element Id
            var res2 = (from b2 in xmlNew.Descendants("book") select new XElement("book", new XAttribute("id",b2.Attribute("id").Value))).Except(xmlOld.Descendants("book"));

            var res3 = reportXmlItems.Union(res2);

            var reportXml = new XElement("books", res3);
            reportXml.Save(@"c:\test\result.xml");
        }

    public class MyComparer : IEqualityComparer<XElement>
    {
        public bool Equals(XElement x, XElement y)
        {
            return x.Attribute("id").Value == y.Attribute("id").Value;
        }

        public int GetHashCode(XElement obj)
        {
            return obj.Attribute("id").Value.GetHashCode();
        }
    }

3 个答案:

答案 0 :(得分:2)

在推荐自我实现的解决方案之前,我建议你先看看xmldiff。你可以在这里找到一些参考:http://msdn.microsoft.com/en-us/library/aa302294.aspx

答案 1 :(得分:1)

我没有看到比较具有相同id的节点的意义 - 它们可以直接更改。但话说回来,您可以使用LINQ to XML比较和合并XML文档,如下所示:

// XMLs
string oldXML = @"<Books>
<book id='1' image='C01' name='C# in Depth'/>
<book id='2' image='C02' name='ASP.NET'/>
<book id='3' image='C03' name='LINQ in Action '/>
<book id='4' image='C04' name='Architecting Applications'/>
</Books>";
string newXML = @"<Books>
<book id='1' image='C011' name='C# in Depth'/>
<book id='2' image='C02' name='ASP.NET 2.0'/>
<book id='3' image='XXXC03' name='XXXLINQ in Action '/>
<book id='4' image='C04' name='Architecting Applications'/>
<book id='5' image='C05' name='PowerShell in Action'/>
</Books>";

代码:

// xml documents
var xmlOld = XDocument.Parse(oldXML);
var xmlNew = XDocument.Parse(newXML);
// helper function to get the attribute value of the given element by attribute name
Func<XElement, string, string> getAttributeValue = (xElement, name) => xElement.Attribute(name).Value;
// nodes for which we are looking for
var nodeName = "book";
var sameNodes = new List<string>();
// iterate over all old nodes (this will replace all existing but changed nodes)
xmlOld.Descendants(nodeName).ToList().ForEach(item =>
{
    var currentElementId = getAttributeValue(item, "id");
    // find node with the same id in the new nodes collection
    var toReplace = xmlNew.Descendants(nodeName).ToList().FirstOrDefault(n => getAttributeValue(n, "id") == currentElementId);
    if (toReplace != null)
    {
        var aImageOldValue = getAttributeValue(item, "image");
        var aImageNewValue = getAttributeValue(toReplace, "image");
        var aNameOldValue = getAttributeValue(item, "name");
        var aNameNewValue = getAttributeValue(toReplace, "name");
        if ((aImageNewValue != aImageOldValue) || (aNameOldValue != aNameNewValue))
        {
            // replace attribute values
            item.Attribute("image").Value = getAttributeValue(toReplace, "image");
            item.Attribute("name").Value = getAttributeValue(toReplace, "name");
        }
        else if ((aImageNewValue == aImageOldValue) && (aNameOldValue == aNameNewValue))
        {
            // remove same nodes! can't remove the node yet, because it will be seen as new
            sameNodes.Add(getAttributeValue(item, "id"));
        }
    }
});
// add new nodes
// id's of all old nodes
var oldNodes = xmlOld.Descendants(nodeName).Select (node => getAttributeValue(node, "id")).ToList();
// id's of all new nodes
var newNodes = xmlNew.Descendants(nodeName).Select (node => getAttributeValue(node, "id")).ToList();
// find new nodes that are not present in the old collection
var nodeIdsToAdd = newNodes.Except(oldNodes);
// add all new nodes to the already modified xml document
foreach (var newNodeId in nodeIdsToAdd)
{
    var newNode = xmlNew.Descendants(nodeName).FirstOrDefault(node => getAttributeValue(node, "id") == newNodeId);
    if (newNode != null)
    {
        xmlOld.Root.Add(newNode);
    }
}
// remove unchanged nodes
foreach (var oldNodeId in sameNodes)
{
    xmlOld.Descendants(nodeName).FirstOrDefault (node => getAttributeValue(node, "id") == oldNodeId).Remove();
}
xmlOld.Save(@"d:\temp\merged.xml");

生成的XML如下所示:

<Books>
  <book id="1" image="C011" name="C# in Depth" />
  <book id="2" image="C02" name="ASP.NET 2.0" />
  <book id="3" image="XXXC03" name="XXXLINQ in Action " />
  <book id="5" image="C05" name="PowerShell in Action" />
</Books>

答案 2 :(得分:0)

我认为可以在原始问题的声明性风格中更简洁,更轻微地执行此操作,如下所示:

XDocument xdoc = new XDocument(new XElement("Books",
    from newBook in XDocument.Parse(newXML).Descendants("book")
    join oldBook in XDocument.Parse(oldXML).Descendants("book")
        on newBook.Attributes("id").First().Value equals oldBook.Attributes("id").First().Value into oldBooks
    where !oldBooks.Any()
            || newBook.Attributes().Any(a => a.Value != oldBooks.First().Attributes(a.Name).First().Value)
    select newBook));

这给出了所要求的答案,尽管通常Linq在一行中执行大量操作的方式可能会使其难以理解和调试。请注意我们在ID上进行群组加入,然后只选择那些没有匹配的旧项目的新项目,或者匹配项目上有一些不同的属性值。