HTML Agility从div中的段落标记中获取文本

时间:2014-10-05 20:14:20

标签: windows-phone-8.1 html-agility-pack

我试图在Windows Phone 8.1应用中使用htmlagilitypack 2.28在div中获取段落标签的文本。

div的结构是

<div id="55">

<p>&nbsp;</p>

<p><span class="dropcap">W

</span><span class="zw-portion"><strong>ith the start of festive season in India</strong>, we   
will also witness the f<strong>irst London Derby</strong> of the season    
between the newly London rivals <strong>Chelsea and Arsenal</strong>. It will be a great chance  
for Arsene Wenger to get rid of his <strong>1000</strong></span>

<strong><span class="zw-portion">th</span><span class="zw-portion"> managed </span>

<span class="zw-portion">6-0 </spa>  

<span class="zw-portion">massacre</span></strong>

<span class="zw-portion"> in March,</span>

<span class="zw-portion">&nbsp;</span>

<span class="zw-portion">while the Special One will be eager to continue his winning rampage  
</span>

<span class="zw-portion">&nbsp;</span>

<span class="zw- portion">over his “<strong>Specialist in Failure</strong>” counterpart. Although 
both clubs can boast of being unbeaten this season and both clubs can take this opportunity 
</span>

<span class="zw-portion"> to bring down their rival</span><span class="zw-portion">.</span></p>

<p>&nbsp;</p>

<p><iframe width="640" height="360" src="https://www.youtube.com/embed/zFBN8M1pCxo?  
feature=oembed" frameborder="0" allowfullscreen=""></iframe></p>

<p class="zw-paragraph" data-textformat="
{&quot;type&quot;:&quot;text&quot;,&quot;td&quot;:&quot;none&quot;}"></p>

<p class="zw-paragraph" data-textformat=   
{&quot;type&quot;:&quot;text&quot;,&quot;td&quot;:&quot;none&quot;}">

<span class="zw-portion">The rivalry between Chelsea and Arsenal was not as a primary London  
Derby, until Chelsea rose to top of Premier League in 2000’s, when they consistently competed 
against each other. The rivalry between the two clubs rose higher as compared to their 
traditional rivals. Both the clubs rivalry are now not only limited to their pitch but has also 
been to the fans. In 2009 survey by Football Fans Census, Arsenal fans named Chelsea as the 

<strong>most disliked club</strong>  </span>

<span class="zw-portion"> ahead of their traditional rivals <strong>Manchest</strong></span>
<strong> <span class="zw-portion">er United and Tottenham Hotspur</span></strong>

<span class="zw-portion">. However the report of the other camp doesn’t differ much as Chelsea 
fans ranks Arsenal as their <strong>second most-disliked club</strong></span>

<strong><span class="zw-portion">.
</span></strong></p>
</div>

我想只提取div中的paragraph元素中包含的文本。 到目前为止,我已经编写了以下代码,其中feedurl包含要从中提取数据的页面地址(提取正确的地址)。之后,我尝试使用它的id(总是等于55)来获取对div的引用。

var feedurl = GetValue("feedurl");
string htmlPage = "asdsad";
HtmlDocument htmldoc = new HtmlDocument();
htmldoc.LoadHtml(feedurl);
htmldoc.OptionUseIdAttribute=true;
HtmlNode div = htmldoc.GetElementbyId("55");
if (div != null)
{
    htmlPage += "done";
}

_content = htmlPage;
return _content;

htmldoc.GetElementbyId("55");返回空引用。 我已阅读使用htmldoc.DocumentNode.SelectNodes([arguments])。但我没有SelectNodes方法。我迷失了如何继续前进的道路。请帮忙。

1 个答案:

答案 0 :(得分:1)

WP 8.1的HtmlAgilityPack版本不支持SelectNodes(),因为该方法需要XPath实现,遗憾的是在.NET版本中缺少WP8.1。

解决方案是使用HtmlAgilityPack的LINQ API而不是Xpath。例如,要获取<div>属性等于id的{​​{1}}元素:

55