这里我使用linq创建一个xml而没有获得所需的格式。这是我的代码
List<string> listvalue = new List<string>();
listvalue.Add("http://example.com/sample.html");
listvalue.Add("http://example.com/new.html");
foreach (string url in listvalue)
{
var document = new HtmlWeb().Load(url);
var urls = document.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s));
List<string> asList = urls.ToList();
GenerateXml(url, asList);
}
和
protected void GenerateXml(string url, List<string> listitems) //generateXml
{
XNamespace nsSitemap = "http://www.sitemaps.org/schemas/sitemap/0.9";
XNamespace nsImage = "http://www.google.com/schemas/sitemap-image/1.1";
var sitemap = new XDocument(new XDeclaration("1.0", "UTF-8", ""));
var urlSet = new XElement(nsSitemap + "urlset",
new XAttribute("xmlns", nsSitemap),
new XAttribute(XNamespace.Xmlns + "image", nsImage),
new XElement(nsSitemap + "url",
new XElement(nsSitemap + "loc", url),
from urlNode in listitems
select new XElement(nsImage + "image",
new XElement(nsImage + "loc", urlNode)
)));
sitemap.Add(urlSet);
sitemap.Save(System.Web.HttpContext.Current.Server.MapPath("/Static/sitemaps/Sitemap-image.xml"));
}
我需要以下格式
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://example.com/sample.html</loc>
<image:image>
<image:loc>http://example.com/image.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://example.com/photo.jpg</image:loc>
</image:image>
</url>
<url>
<loc>http://example.com/new.html</loc>
<image:image>
<image:loc>http://example.com/newimage.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://example.com/newphoto.jpg</image:loc>
</image:image>
</url>
</urlset>
但是我在这里得到一个网址标签。怎么做到这一点?有什么建议吗?
答案 0 :(得分:2)
听起来这只是一个想要在你完全致电GenerateXml
之前获取所有所有网址(来自所有源文档)的情况 - 并记住每个人来自哪里从。这很简单:
var sources = new List<string>
{
"http://example.com/sample.html",
"http://example.com/new.html"
};
var imagesBySource = sources
.ToDictionary(source => source,
source => new HtmlWeb().Load(url)
.DocumentNode.Descendants("img")
.Select(e => e.GetAttributeValue("src", null))
.Where(s => !String.IsNullOrEmpty(s))
.ToList());
GenerateXml(imagesBySource);
然后,您需要更改GenerateXml
才能获得Dictionary<string, List<string>>
。像(未经测试)的东西:
protected void GenerateXml(Dictionary<string, List<string>> imagesByUrl)
{
XNamespace nsSitemap = "http://www.sitemaps.org/schemas/sitemap/0.9";
XNamespace nsImage = "http://www.google.com/schemas/sitemap-image/1.1";
var sitemap = new XDocument(new XDeclaration("1.0", "UTF-8", ""));
var urlSet = new XElement(nsSitemap + "urlset",
new XAttribute("xmlns", nsSitemap),
new XAttribute(XNamespace.Xmlns + "image", nsImage),
imagesByUrl.Select(entry =>
new XElement(nsSitemap + "url",
new XElement(nsSitemap + "loc", entry.Key),
from urlNode in entry.Value
select new XElement(nsImage + "image",
new XElement(nsImage + "loc", urlNode)
)
)
);
sitemap.Add(urlSet);
var path = HttpContext.Current.Server.MapPath("/Static/sitemaps/Sitemap-image.xml");
sitemap.Save(path);
}
请注意,这不能保证源的顺序得以保留。如果您需要,可以创建一个包含Url
和Images
属性的类,并将这些属性的列表传递给GenerateXml
。