我想知道HtmlAgilityPack读取包含xsl文件来呈现html的xml文件的最佳方法。 HtmlDocument类上是否有任何设置可以帮助解决这个问题,或者在使用HtmlAgiliyPack加载转换之前是否必须找到执行转换的方法?如果对后者是肯定的,那么任何人都知道这种转变的好库或方法吗?下面是一个使用xls文件返回xml的网站示例以及我想要使用的代码。
var uri = new Uri("http://www.skechers.com/");
var request = (HttpWebRequest)WebRequest.Create(url);
var cookieContainer = new CookieContainer();
request.CookieContainer = cookieContainer;
request.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
request.Method = "GET";
request.AllowAutoRedirect = true;
request.Timeout = 15000;
var response = (HttpWebResponse)request.GetResponse();
var page = new HtmlDocument();
page.OptionReadEncoding = false;
var stream = response.GetResponseStream();
page.Load(stream);
这段代码不会抛出任何错误,但是xml是被解析的而不是转换,这就是我想要的。
答案 0 :(得分:3)
Html Agility Pack可以在两点帮助您:
1)使用它获取Xml处理指令更容易,因为它将PI数据解析为Html,因此它会将其转换为属性
2)HtmlDocument实现IXPathNavigable,因此可以通过.NET Xslt转换引擎直接转换。
这是一段有用的代码。我必须添加一个特定的XmlResover来正确处理Xslt转换,但我认为这是特定于这个skechers的情况。
public static void DownloadAndProcessXml(string url, string userAgent, string outputFilePath)
{
using (XmlTextWriter writer = new XmlTextWriter(outputFilePath, Encoding.UTF8))
{
DownloadAndProcessXml(url, userAgent, writer);
}
}
public static void DownloadAndProcessXml(string url, string userAgent, XmlWriter output)
{
UserAgentXmlUrlResolver resolver = new UserAgentXmlUrlResolver(url, userAgent);
// WebClient is an easy to use class.
using (WebClient client = new WebClient())
{
// download Xml doc. set User-Agent header or the site won't answer us...
client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
HtmlDocument xmlDoc = new HtmlDocument();
xmlDoc.Load(client.OpenRead(url));
// determine xslt (note the xpath trick as Html Agility Pack does not support xml processing instructions)
string xsltUrl = xmlDoc.DocumentNode.SelectSingleNode("//*[name()='?xml-stylesheet']").GetAttributeValue("href", null);
// download Xslt doc
client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(new XmlTextReader(client.OpenRead(url + xsltUrl)), new XsltSettings(true, false), null);
// transform Html/Xml doc into new Xml doc, easy as HtmlDocument implements IXPathNavigable
// note the use of a custom resolver to overcome this Xslt resolve requests
xslt.Transform(xmlDoc, null, output, resolver);
}
}
// This class is needed during transformation otherwise there are errors.
// This is probably due to this very specific Xslt file that needs to go back to the root document itself.
public class UserAgentXmlUrlResolver : XmlUrlResolver
{
public UserAgentXmlUrlResolver(string rootUrl, string userAgent)
{
RootUrl = rootUrl;
UserAgent = userAgent;
}
public string RootUrl { get; set; }
public string UserAgent { get; set; }
public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
WebClient client = new WebClient();
if (!string.IsNullOrEmpty(UserAgent))
{
client.Headers[HttpRequestHeader.UserAgent] = UserAgent;
}
return client.OpenRead(absoluteUri);
}
public override Uri ResolveUri(Uri baseUri, string relativeUri)
{
if ((relativeUri == "/") && (!string.IsNullOrEmpty(RootUrl)))
return new Uri(RootUrl);
return base.ResolveUri(baseUri, relativeUri);
}
}
你这样称呼它:
string url = "http://www.skechers.com/";
string ua = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
DownloadAndProcessXml(url, ua, "skechers.html");
答案 1 :(得分:2)
您应该呈现XML和XSLT的输出。为此,您需要下载XML,并且您已经完成了。接下来解析XML以标识XSL引用。然后,您需要下载XSL并将其应用于XML文档。
这些链接可能很有用
答案 2 :(得分:0)
这是我收到回复后最终使用的附加代码。请注意,只有在响应为" application / xml"并且您必须检查整个对象的空实例。此外,FormAssetSrc是一个私有函数,它接受href的值并确定它是协议,根或文档相对,并创建完全限定的uri。
var xmlStream = response.GetResponseStream();
var xmlDocument = new XPathDocument(xmlStream);
var styleNode = xmlDocument.CreateNavigator().SelectSingleNode("processing-instruction('xml-stylesheet')");
var hrefValue = Regex.Match((styleNode).Value, "href=(\"|')(?<url>.*?)(\"|')");
if(hrefValue.Success)
{
var xslHref = FormAssetSrc(hrefValue.Groups["url"].Value, response.ResponseUri);
var xslUri = new Uri(xslHref);
var xslRequest = CreateWebRequest(xslUri);
var xslResponse = (HttpWebResponse)xslRequest.GetResponse();
var xslStream = new XPathDocument(xslResponse.GetResponseStream());
var xslTransorm = new XslTransform();
var sw = new System.IO.StringWriter();
xslTransorm.Load(xslStream);
xslTransorm.Transform(xmlDocument.CreateNavigator(), null, sw);
page.Html.LoadHtml(sw.ToString());
}