好的,所以我想从<span>
标签中获取情节摘要,它似乎不适用于所有网页。
这就是我所拥有的:
private ArrayList getSynopsis()
{
for (int i = 0; i < animeURLList.Count; i++)
{
var mainURL = "http://www.animenewsnetwork.com";
var theHTML = wc.DownloadString(mainURL + (string) animeURLList[i]);
MessageBox.Show(theHTML);
//inner html for the span info
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(theHTML);
//var array = doc.DocumentNode.SelectNodes("//div[@id='infotype-12'][@class='encyc-info-type br']);
//MessageBox.Show("got here" + array.ToString());
ArrayList synopList = new ArrayList();;
foreach (HtmlNode node doc.DocumentNode.SelectNodes("//div[@id='infotype-12'][@class='encyc-info-type br']"))
{
synopList.Add(node.GetAttributeValue("span", "null"));
}
}
return null;
}
我正试图抓住文字:
<div id="infotype-12" class="encyc-info-type br">
<strong>Plot Summary:</strong>
<span>Tooru takes a test so she can enter the same high school as Run, the girl she likes. She passes, but when she goes to tell Run, she finds her hugging a girl she's never seen before.</span>
</div>
span标签中有情节摘要,这正是我想要抓住的。
我仍然无法理解这一点。
答案 0 :(得分:0)
在循环中尝试以下操作:
synopList.Add(node.SelectSingleNode("span").InnerText);
此外,您正在使用的XPath:
"//div[@id='infotype-12'][@class='encyc-info-type br']"
会更好:
"//div[@id='infotype-12' and @class='encyc-info-type br']"