LINQ to XML Multiple Selects

时间:2011-03-18 14:39:28

标签: c# linq-to-xml

编辑:以下是我要解析的示例XML文档:http://us.battle.net/wow/en/forum/1011699/(查看源代码)。

以下是我要检索的项目:

  • 标题(tbody / tr / td / a)
  • 作者(tbody / tr / td)
  • Url(也存储在作者节点中)
  • 日期(tbody / tr / td / div / div)
  • 回复(tbody / tr / td)
  • 视图(也存储在上述节点中)

我执行'预查询',因此我不必为每个后续查询遍历:

var threads =
    from allThreads in xmlThreadList.Descendants(ns + "tbody")
                                    .Descendants(ns + "tr")
                                    .Descendants(ns + "td")
    select allThreads;

我有一个表示论坛帖子列表的XML文档。在每个线程中都有不同的子节点,它们包含我想要检索的不同信息。目前,我通过多次查询XML文档来完成此操作。有没有办法在单个查询中提取此信息并将其存储在IEnumerable中?我现在这样做的方式似乎效率低下。

    // array of xelements that contain the title and url
    var threadTitles =
        (from allThreads in threads.Descendants(ns + "a")
        where allThreads.Parent.Attribute("class").Value.Equals("post-title")
        select allThreads).ToArray();

    // array of strings of author names
    var threadAuthors =
        (from allThreads in threads
        where allThreads.Attribute("class").Value.Equals("post-author")
        select allThreads.Value.Trim()).ToArray();

    // ...
    // there are several more queries like this
    // ...

    // for loop to populate a list with all the extracted data
    for (int i = 0, j = 0; i < threadTitles.Length; i++, j++)
    {
        ThreadItem threadItem = new ThreadItem();

        threadItem.Title = threadTitles[i].Value.Trim();
        threadItem.Author = threadAuthors[i];
        threadItem.Url = Path.Combine(_url, threadTitles[i].Attribute("href").Value);
        threadItem.Date = threadDates[i];
        threadItem.Replies = threadRepliesAndViews[j++];
        threadItem.Views = threadRepliesAndViews[j];
        _threads.Add(threadItem);
    }

任何建议都将不胜感激。我是整个LINQ to XML场景的新手。

2 个答案:

答案 0 :(得分:2)

希望这会有所帮助:

string ns = "{http://www.w3.org/1999/xhtml}";

var doc = XDocument.Load("http://us.battle.net/wow/en/forum/1011699/");
var threads = from tr in doc.Descendants(ns + "tbody").Elements(ns + "tr")
              let elements = tr.Elements(ns + "td")
              let title = elements.First(a => a.Attribute("class").Value == "post-title").Element(ns + "a")
              let author = elements.First(a => a.Attribute("class").Value == "post-author")
              let replies = elements.First(a => a.Attribute("class").Value == "post-replies")
              let views = elements.First(a => a.Attribute("class").Value == "post-views")
              select new
              {
                  Title = title.Value.Trim(),
                  Url = title.Attribute("href").Value.Trim(),
                  Author = author.Value.Trim(),
                  Replies = int.Parse(replies.Value),
                  Views = int.Parse(views.Value)
              };

foreach (var item in threads)
{
    Console.WriteLine(item);
}

Console.ReadLine();

答案 1 :(得分:1)

尝试类似

的内容
from thread in threads
select new ThreadItem() {
   Title = thread.Descendants(ns + "a").First( title => title.Parent.Attribute("class").Value.Equals("post-title")),
  Date = date query part

  ect.... 
}

这会获得一些速度,因为你不会一次又一次地解析整个xml块,而是每次只查看每个较小的线程几次提取不同的信息。

我有兴趣知道哪个更快,因为你有效地交易希望整个元素项适合缓存,因此当你执行它上面的所有小查询时,你可以快速访问它,希望(在你的旧代码中)你的cpu上的分支预测器将调整为每个长查询执行,以提供更好的速度。