如何使用linq格式化xml?

时间:2016-01-08 06:59:22

标签: c# xml linq linq-to-xml

这里我使用linq创建一个xml而没有获得所需的格式。这是我的代码

List<string> listvalue = new List<string>();
listvalue.Add("http://example.com/sample.html");
listvalue.Add("http://example.com/new.html");
foreach (string url in listvalue)
{
    var document = new HtmlWeb().Load(url);
    var urls = document.DocumentNode.Descendants("img")
                                    .Select(e => e.GetAttributeValue("src", null))
                                    .Where(s => !String.IsNullOrEmpty(s));

    List<string> asList = urls.ToList();
    GenerateXml(url, asList);                       

}

protected void GenerateXml(string url, List<string> listitems)  //generateXml
{

    XNamespace nsSitemap = "http://www.sitemaps.org/schemas/sitemap/0.9";
    XNamespace nsImage = "http://www.google.com/schemas/sitemap-image/1.1";

    var sitemap = new XDocument(new XDeclaration("1.0", "UTF-8", ""));

    var urlSet = new XElement(nsSitemap + "urlset",
        new XAttribute("xmlns", nsSitemap),
        new XAttribute(XNamespace.Xmlns + "image", nsImage),
        new XElement(nsSitemap + "url",
        new XElement(nsSitemap + "loc", url),
        from urlNode in listitems
        select new XElement(nsImage + "image",
               new XElement(nsImage + "loc", urlNode)
           )));
    sitemap.Add(urlSet);
    sitemap.Save(System.Web.HttpContext.Current.Server.MapPath("/Static/sitemaps/Sitemap-image.xml"));
}

我需要以下格式

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>http://example.com/sample.html</loc>
    <image:image>
      <image:loc>http://example.com/image.jpg</image:loc>
    </image:image>
    <image:image>
      <image:loc>http://example.com/photo.jpg</image:loc>
    </image:image>
  </url>
<url>
    <loc>http://example.com/new.html</loc>
    <image:image>
      <image:loc>http://example.com/newimage.jpg</image:loc>
    </image:image>
    <image:image>
      <image:loc>http://example.com/newphoto.jpg</image:loc>
    </image:image>
  </url>
</urlset>

但是我在这里得到一个网址标签。怎么做到这一点?有什么建议吗?

1 个答案:

答案 0 :(得分:2)

听起来这只是一个想要在你完全致电GenerateXml之前获取所有所有网址(来自所有源文档)的情况 - 并记住每个人来自哪里从。这很简单:

var sources = new List<string>
{
    "http://example.com/sample.html",
    "http://example.com/new.html"
};
var imagesBySource = sources
    .ToDictionary(source => source,
                  source => new HtmlWeb().Load(url)
                               .DocumentNode.Descendants("img")
                               .Select(e => e.GetAttributeValue("src", null))
                               .Where(s => !String.IsNullOrEmpty(s))
                               .ToList());
GenerateXml(imagesBySource);

然后,您需要更改GenerateXml才能获得Dictionary<string, List<string>>。像(未经测试)的东西:

protected void GenerateXml(Dictionary<string, List<string>> imagesByUrl)
{    
    XNamespace nsSitemap = "http://www.sitemaps.org/schemas/sitemap/0.9";
    XNamespace nsImage = "http://www.google.com/schemas/sitemap-image/1.1";

    var sitemap = new XDocument(new XDeclaration("1.0", "UTF-8", ""));

    var urlSet = new XElement(nsSitemap + "urlset",
        new XAttribute("xmlns", nsSitemap),
        new XAttribute(XNamespace.Xmlns + "image", nsImage),
        imagesByUrl.Select(entry => 
            new XElement(nsSitemap + "url",
                new XElement(nsSitemap + "loc", entry.Key),
                from urlNode in entry.Value
                select new XElement(nsImage + "image",
                    new XElement(nsImage + "loc", urlNode)
                )
        )
    );
    sitemap.Add(urlSet);
    var path = HttpContext.Current.Server.MapPath("/Static/sitemaps/Sitemap-image.xml");
    sitemap.Save(path);
}

请注意,这不能保证源的顺序得以保留。如果您需要,可以创建一个包含UrlImages属性的类,并将这些属性的列表传递给GenerateXml