Question

我是自定义编写的东西，以帮助我们获取两个有效的XML文件，比较它们，并生成一种插入/更新/删除列表，我们将这些列表传递给另一个系统进行数据集成。

我生命中从未接触过LINQ，所以这是我第一次尝试它。它工作正常，所以我开始扩展测试性能的限制。目前我可能只与受薪员工打交道，我的数据集很小，但考虑到有一天这个系统可能包含每小时的员工，我想测试~100k xml实体的上限。

这样做是采用before.xml文件和after.xml文件，迭代一个已定义的实体，并希望提取任何更改值。

我在8-9秒内处理了~100k行。我不知道这是件坏事，但对我们来说这是一个完全可以接受的号码。问题是我只测试了1-3次更新。插入/删除简单快捷。但是更新，每次我添加一个新的检测时，它都会在计算时间上增加一秒！

我只能假设我的第一个LINQ查询是罪魁祸首。这就是我所拥有的。是否有一些明显低效或错误的东西，我应该学会避免？：

////////////for sake of this demo//////////////
string entityNode = "book";
string guidAttribute = "id";
///////////////////////////////////////////////
IEnumerable<XElement> befores = XElement.Load(beforeXMLFile).Elements(entityNode);
IEnumerable<XElement> afters = XElement.Load(afterXMLFile).Elements(entityNode);

//Updates/changes
IEnumerable<XElement> updates =
    from afterChild in afters.Descendants()
    join beforeChild in befores.Descendants() on 
                                       new
                                       {
                                           ((XElement)afterChild).Parent.Attribute(guidAttribute).Value,
                                           ((XElement)afterChild).Name
                                       }
                                       equals new
                                       {
                                           ((XElement)beforeChild).Parent.Attribute(guidAttribute).Value,
                                           ((XElement)beforeChild).Name
                                       }
    where (((XElement)afterChild).Value != ((XElement)beforeChild).Value)
    select ((XElement)afterChild);

我的XML（丑陋的测试，忽略）看起来像：

<?xml version="1.0" encoding="UTF-8" ?>
<catalog>
    <book id="bk1">
        <author>Le, Kellie T.</author>
        <title>amet,</title>
        <genre>Horror</genre>
        <price>68 590</price>
        <publish_date>09-01-2014</publish_date>
        <description>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Curabitur</description>
        <id>1</id>
    </book>
    <book id="bk2">
        <author>Hoffman, Leonard H.</author>
        <title>molestie</title>
        <genre>Romance</genre>
        <price>26 761</price>
        <publish_date>03-10-2013</publish_date>
        <description>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Curabitur sed tortor. Integer aliquam adipiscing lacus. Ut nec</description>
        <id>2</id>
    </book>

.... plus another 100k of these with random data

</catalog>

我的目标是将我的主要实体元素的任何子元素（例如＆＃34; book＆＃34;）加入到字段元素的名称和它的ID的父元素之前的值！ =之后。再一次，这完全按照我的意愿行事，但我怀疑我的加入速度很慢。

有更好的方法吗？

谢谢！

Answer 1

Descendants返回文档树中的所有内容。如果你有100k book个元素，它会返回近100万个元素，因为每个元素有9个子元素。它为您提供了900,000 X 900,000 = 810,000,000,000对作为join输入，并且多次评估您的join条件。我并不感到惊讶，这很慢。

除非你真的需要，否则你不应该使用Descendants。请改用Elements。我刚才写了一篇文章：Why (or when) you should/shouldn't use Descendants() method。

的更新 的

如何将查询拆分为两个？首先，加入book上的id，然后尝试从其元素中获取更新值？

var pairs = from a in afters join b in befores on (string)a.Attribute(guidAttribute) equals (string)b.Attribute(guidAttribute) select new { a, b }; var updates = from p in pairs from ac in p.a.Elements() from bc in p.b.Elements() where ac.Name == bc.Name && (string)ac != (string)bc select ac;

在XML上使用LINQ - 连接非常慢

1 个答案: