提取div的InnerHtml?

时间:2012-03-30 18:45:55

标签: c# asp.net

我有一个页面,我需要从中提取div的innerhtml。为了识别div,我只有班级。

<div class="os-box unround">
:
:
:
</div>

我需要提取具有class "os-box unround"的div的innerhtml,假设页面来自网页http://abc.com/xyz.html,在页面加载事件中使用C#。

**Input:**

<div class="os-box unround">

    <div class="os-list" id="os-list-6.1 x64">



    <div class="item-box">

        <p class="item-title"><a href="http://devid.info/en/p127116/Atheros+AR5B95+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5B95 Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p>

        <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p>

        <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p>

     <p class="item-date"><span>Driver Date: </span>2010-09-26</p> <p class="item-version"><span>Version: </span>8.0.0.372</p>     <p class="download"><a href="http://devid.info/p127116/Atheros+AR5B95+Wireless+Network+Adapter">Download</a></p>

    </div>



    <div class="adv-box">



    </div>



    <div class="item-box">

        <p class="item-title"><a href="http://devid.info/en/p145532/Atheros+AR5005G+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5005G Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p>

        <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p>

        <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p>

     <p class="item-date"><span>Driver Date: </span>2010-07-08</p> <p class="item-version"><span>Version: </span>9.0.0.222</p>     <p class="download"><a href="http://devid.info/p145532/Atheros+AR5005G+Wireless+Network+Adapter">Download</a></p>

    </div>





    <div class="item-box">

        <p class="item-title"><a href="http://devid.info/en/p134802/Atheros+AR5008X+Wireless+Network+Adapter"><span class="mark">Atheros</span> AR5008X Wireless <span class="mark">Network</span> <span class="mark">Adapter</span></a></p>

        <p class="item-vendor"><span>Vendor: </span>Atheros Communications Inc.</p>

        <p class="item-os"><span>Operating system: </span>Vista64 W7x64</p>

     <p class="item-date"><span>Driver Date: </span>2010-06-24</p> <p class="item-version"><span>Version: </span>9.0.0.208</p>     <p class="download"><a href="http://devid.info/p134802/Atheros+AR5008X+Wireless+Network+Adapter">Download</a></p>

    </div>

</div>
<div>

有些网址,说http://abc.com/xyz.html上面有这样的html。我想阅读它并在我自己的页面上显示它的页面加载事件。

输出;

包含os-box unround div的内部html的字符串。

1 个答案:

答案 0 :(得分:0)

你试过HtmlAgilityPack吗?它将允许您解析和查询(使用XPATH)您找到的许多格式错误的HTML。

如果我正确理解您的问题,您可以使用:

HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://abc.com/xyz.html");

HtmlAgilityPack.HtmlNode div = doc.DocumentNode
    .SelectSingleNode("/html/body/div[@class=\"os-box unround\"]");
string contentYouWantedToDisplayOnYourOwnPage = div.InnerHtml;