我有一个页面,我需要从中提取div的innerhtml。为了识别div,我只有班级。
<div class="os-box unround">
:
:
:
</div>
我需要提取具有class "os-box unround"
的div的innerhtml,假设页面来自网页http://abc.com/xyz.html,在页面加载事件中使用C#。
**Input:**
<div class="os-box unround">
<div class="os-list" id="os-list-6.1 x64">
<div class="item-box">
<p class="item-title"><a href="http://devid.info/en/p127116/Atheros+AR5B95+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5B95 Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p>
<p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p>
<p class="item-os"><span>Operating system: </span>Vista64 W7x64</p>
<p class="item-date"><span>Driver Date: </span>2010-09-26</p> <p class="item-version"><span>Version: </span>8.0.0.372</p> <p class="download"><a href="http://devid.info/p127116/Atheros+AR5B95+Wireless+Network+Adapter">Download</a></p>
</div>
<div class="adv-box">
</div>
<div class="item-box">
<p class="item-title"><a href="http://devid.info/en/p145532/Atheros+AR5005G+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5005G Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p>
<p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p>
<p class="item-os"><span>Operating system: </span>Vista64 W7x64</p>
<p class="item-date"><span>Driver Date: </span>2010-07-08</p> <p class="item-version"><span>Version: </span>9.0.0.222</p> <p class="download"><a href="http://devid.info/p145532/Atheros+AR5005G+Wireless+Network+Adapter">Download</a></p>
</div>
<div class="item-box">
<p class="item-title"><a href="http://devid.info/en/p134802/Atheros+AR5008X+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5008X Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p>
<p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p>
<p class="item-os"><span>Operating system: </span>Vista64 W7x64</p>
<p class="item-date"><span>Driver Date: </span>2010-06-24</p> <p class="item-version"><span>Version: </span>9.0.0.208</p> <p class="download"><a href="http://devid.info/p134802/Atheros+AR5008X+Wireless+Network+Adapter">Download</a></p>
</div>
</div>
<div>
有些网址,说http://abc.com/xyz.html上面有这样的html。我想阅读它并在我自己的页面上显示它的页面加载事件。
输出;
包含os-box unround div的内部html的字符串。
答案 0 :(得分:0)
你试过HtmlAgilityPack吗?它将允许您解析和查询(使用XPATH)您找到的许多格式错误的HTML。
如果我正确理解您的问题,您可以使用:
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://abc.com/xyz.html");
HtmlAgilityPack.HtmlNode div = doc.DocumentNode
.SelectSingleNode("/html/body/div[@class=\"os-box unround\"]");
string contentYouWantedToDisplayOnYourOwnPage = div.InnerHtml;