我需要从“http://anytimefitness.com/find-gym/list/AL”获取位置,地址和电话号码。到目前为止,我有这个......
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(stateURLs[0].ToString());
var BlankNode =
htmlDoc.DocumentNode.SelectNodes("/div[@class='segmentwhite']/table[@style='width: 100%;']//tr[@class='']");
var GrayNode =
htmlDoc.DocumentNode.SelectNodes("/div[@class='segmentwhite']/table[@style='width: 100%;']//tr[@class='gray_bk']");
我已经浏览了一段时间的stackoverflow,但目前关于htmlagilitypack的帖子都没有真正帮助过。我也一直在使用http://www.w3schools.com/xpath/xpath_syntax.asp
答案 0 :(得分:1)
由于您所关注的<div>
不是根节点的直接子节点,因此您需要使用//
而不是/
。然后,您可以使用BlankNode
运算符组合GrayNode
和or
的XPath,例如:
var htmlweb = new HtmlWeb();
HtmlDocument htmlDoc = htmlweb.Load("http://anytimefitness.com/find-gym/list/AL");
htmlDoc.OptionFixNestedTags = true;
var AllNode =
htmlDoc.DocumentNode.SelectNodes("//div[@class='segmentwhite']/table//tr[@class='' or @class='gray_bk']");
foreach (HtmlNode node in AllNode)
{
var location = node.SelectSingleNode("./td[2]").InnerText;
var address = node.SelectSingleNode("./td[3]").InnerText;
var phone = node.SelectSingleNode("./td[4]").InnerText;
//do something with above informations
}
答案 1 :(得分:0)
这是我在LinqPad中测试过的一个例子。
string url = @"http://anytimefitness.com/find-gym/list/AL";
var client = new System.Net.WebClient();
var data = client.DownloadData(url);
var html = Encoding.UTF8.GetString(data);
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(html);
var gyms = htmlDoc.DocumentNode.SelectNodes("//tbody/tr[@class='' or @class='gray_bk']");
foreach (var gym in gyms) {
var city = gym.SelectSingleNode("./td[2]").InnerText;
var address = gym.SelectSingleNode("./td[3]").InnerText;
var phone = gym.SelectSingleNode("./td[4]").InnerText;
}
由于HtmlAgilityPack也支持Linq,你也可以这样做:
string [] classes = {"", "gray_bk"};
var gyms = htmlDoc
.DocumentNode
.Descendants("tr")
.Where(t => classes.Contains(t.Attributes["class"].Value))
.ToList();
gyms.ForEach(gym => {
var city = gym.SelectSingleNode("./td[2]").InnerText;
var address = gym.SelectSingleNode("./td[3]").InnerText;
var phone = gym.SelectSingleNode("./td[4]").InnerText;
});