我试图做这样的事情:
var document = htmlWeb.Load(searchUrl);
var hotels = document.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("listing-content"));
int count = 1;
foreach (var hotel in hotels)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(hotel.InnerText);
if (htmlDoc.DocumentNode != null)
{
var anchors = htmlDoc.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("srp-business-name")); // Error Occurring in here //
foreach (var anchor in anchors)
{
Console.WriteLine(anchor.InnerHtml);
}
}
}
我得到的结果如下:
<a href="http://ad.doubleclick.net/clk;234504055;58257942;j?http://www.marriott.com/NYCMQ" class="url mip-link" data-analytics="{"click_id":1601,"rank":1,"act":1,"FL":"list","target":"name","supermedia":true}" rel="nofollow">New York Marriott Marquis</a>
<a href="http://www.yellowpages.com/new-york-ny/mip/new-york-marriott-marquis-468349733?lid=1000372156461" class="no-tracks hidden url" data-analytics="{"click_id":1601,"rank":1,"act":1,"FL":"list","target":"name","supermedia":true}" rel="nofollow"></a>
<span class="external-link">
<img height="15" src="/images/sprites/search/icon-link-external.png" width="16">
</span>
和
<a href="http://www.yellowpages.com/new-york-ny/mip/courtyard-by-marriott-new-york-manhattan-times-square-south-2198956?lid=178101818" class="url redbold mip-link" data-analytics="{"click_id":1600,"rank":2,"act":1,"FL":"list","target":"name","supermedia":""}">Courtyard by Marriott New York Manhattan/Times Square South</a>
等等。
现在我想要innerHtml
的{{1}}锚点标签。所以我这样做:
class="url redbold mip-link"
我&#39;正确获得第一个结果var document = htmlWeb.Load(searchUrl);
var hotels = document.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("listing-content"));
int count = 1;
foreach (var hotel in hotels)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(hotel.InnerText);
if (htmlDoc.DocumentNode != null)
{
var anchors = htmlDoc.DocumentNode.Descendants("div")
.Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("srp-business-name"));
foreach (var anchor in anchors)
{
htmlDoc.LoadHtml(anchor.InnerHtml);
var hoteltags = htmlDoc.DocumentNode.SelectNodes("//a");
foreach (var tag in hoteltags)
{
if (!string.IsNullOrEmpty(tag.InnerHtml) || !string.IsNullOrWhiteSpace(tag.InnerHtml))
{
Console.WriteLine(tag.InnerHtml);
}
}
}
}
}
但在第二个结果中发生错误:
New York Marriott Marquis
。我做错了什么?
答案 0 :(得分:1)
您正在为所有操作使用相同的DOM对象:
foreach (var hotel in hotels)
{
HtmlDocument htmlDoc = new HtmlDocument();
之后,您使用相同的对象来加载锚标记:
foreach (var anchor in anchors)
{
htmlDoc.LoadHtml(anchor.InnerHtml);
只需更改第二个迭代器中的文档,它应该按预期工作。
foreach (var anchor in anchors)
{
var htmlDocAnchor= new HtmlDocument();
htmlDocAnchor.LoadHtml(anchor.InnerHtml);// And etc..