我想用class="list-2"
<! DOCTYPE html>
<html>
<title>Title</title>
<body>
<div>
<ul class="list-1">
<li class="item">1</li>
<li class="item">2</li>
<li class="item">3</li>
</ul>
<ul class="list-2">
<li class="item">11</li>
<li class="item">22</li>
<li class="item">33</li>
</ul>
<ul class="list-1">
<li class="item">111</li>
<li class="item">222</li>
<li class="item">333</li>
</ul>
</div>
</body>
</html>
这里我从页面中提取所有html
string url = Request.QueryString["url"];
WebClient web = new WebClient();
web.Encoding = System.Text.Encoding.GetEncoding("utf-8");
string html = web.DownloadString(url);
在这里我可以删除代码,直到我的ul
html = html.Remove(0, html.IndexOf("<ul class=\"list-2\">"));
如何仅从此ul获取代码?
提前感谢!
答案 0 :(得分:2)
今天,2015年末,还有一些html解析器(和无头浏览器)可以做到这一点,AngleSharp,一个解析器,是一个。
注意,当使用&#34; WebClient&#34;时,不会执行任何javascript。
此示例从字符串中提取标记(在本例中为&#34;字符串html&#34;):
// --------- your code
string url = Request.QueryString["url"];
WebClient web = new WebClient();
web.Encoding = System.Text.Encoding.GetEncoding("utf-8");
string html = web.DownloadString(url);
// --------- parser code
var parser = new HtmlParser();
var document = parser.Parse(html);
//Get the tag with CSS selectors
var ultag = document.QuerySelector("ul.list-2");
// Get the tag's html string
var ultag_html = ultag.ToHtml();
此示例加载网页并提取标记:
// Setup the configuration to support document loading
var config = Configuration.Default.WithDefaultLoader();
// Load a web page
var address = "an url";
// Asynchronously get the document in a new context using the configuration
var document = await BrowsingContext.New(config).OpenAsync(address);
// This CSS selector gets the desired content
var cssSelector = "ul.list-2";
// Perform the query to get all tags with the content
var ultag = document.QuerySelector(cssSelector);
// Get the tag's html string
var ultag_html = ultag.ToHtml();
进一步阅读/下载: