HtmlAgilityPack从表

时间:2015-07-22 04:25:41

标签: c# html-agility-pack

我想在网站上获取列和行中的数据,但是当我得到数据时,它是垂直的,现在我必须以任何方式获取网站表格中的数据?

我有一张桌子:

<div>
    <span id="ctl00_panelContent_ctl01_ucThongTinThiTruong_lblErr"></span>
</div>
<div>
    <div id="ctl00_panelContent_ctl01_ucThongTinThiTruong_grdTT" class="RadGrid RadGrid_Office2007 rgMultiHeader" style="height:700px;width:100%;">

        <div class="rgHeaderWrapper"><div id="ctl00_panelContent_ctl01_ucThongTinThiTruong_grdTT_GridHeader" class="rgHeaderDiv" style="padding-removed16px;overflow:hidden;">

        <table class="rgMasterTable rgClipCells" border="0" id="ctl00_panelContent_ctl01_ucThongTinThiTruong_grdTT_ctl00_Header" style="width:100%;table-layout:fixed;overflow:hidden;empty-cells:show;">
            <colgroup>
                <col style="width:50px" />
                <col style="width:70px" />
                <col style="width:70px" />
                <col style="width:70px;display:none;" />
                <col style="width:70px;display:none;" />
                <col style="width:70px;display:none;" />
                <col style="width:110px" />
            </colgroup>
            <thead>

这是我的代码:

var document = webBrowser1.Document;
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)document.DomDocument;

var htmlString = documentAsIHtmlDocument3.documentElement.innerHTML;

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);

// string texts = doc.DocumentNode.SelectSingleNode("//div[@class='inner']/p").InnerText;
HtmlNodeCollection texts = doc.DocumentNode.SelectNodes("//table[@class='rgMasterTable rgClipCells']");
string kq = "";

if (texts != null)
{


    foreach (var item in texts)
    {
        kq += item.InnerText + Environment.NewLine;
    }
}
richTextBox1.Text = kq;

}


}

此代码工作正常,但获得的数据是1垂直,我想在网站上获取数据,如何做

3 个答案:

答案 0 :(得分:2)

获得表格后

HtmlNodeCollection texts = doc.DocumentNode.SelectNodes("//table[@class='rgMasterTable rgClipCells']");

您可以获得这样的各行

var rows = texts.Descendants("tr").ToList();

这将给出表格中每行的列表。从那里你可以迭代子节点并得到像这样的值

List<List<string>> rowValues = new List<List<string>>();
foreach (var row in rows)
{
    List<string> currentRowValues = new List<string>();
    foreach (var column in row.ChildNodes)
    {
        currentRowValues.Add(column.InnerText);
    }
    rowValues.Add(currentRowValues);
}

rowValues现在是一个List,其中每一行都表示为List,其中列表的元素是行中单元格的值。

答案 1 :(得分:0)

<tr class="rgRow" id="ctl00_panelContent_ctl01_ucThongTinThiTruong_grdTT_ctl00__0" style="color:Black;font-weight:normal;font-style:normal;">
        <td align="center" valign="middle" style="width:50px;">1</td><td align="right" valign="middle" style="width:70px;">  16 332</td><td align="right" valign="middle" style="width:70px;">  7 367</td><td align="right" valign="middle" style="width:70px;">  1 298</td><td align="right" valign="middle" style="width:70px;">  7 667</td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;">   560</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td>
    </tr><tr class="rgAltRow" id="ctl00_panelContent_ctl01_ucThongTinThiTruong_grdTT_ctl00__1" style="color:Black;font-weight:normal;font-style:normal;">
        <td align="center" valign="middle" style="width:50px;">2</td><td align="right" valign="middle" style="width:70px;"><span>  15 852</span><img src="Images/Grid/down.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;"><span>  7 157</span><img src="Images/Grid/down.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;"><span>  1 477</span><img src="Images/Grid/up.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;"><span>  7 218</span><img src="Images/Grid/down.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;">   560</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td>
    </tr><tr class="rgRow" id="ctl00_panelContent_ctl01_ucThongTinThiTruong_grdTT_ctl00__2" style="color:Black;font-weight:normal;font-style:normal;">
        <td align="center" valign="middle" style="width:50px;">3</td><td align="right" valign="middle" style="width:70px;"><span>  15 575</span><img src="Images/Grid/down.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;"><span>  6 853</span><img src="Images/Grid/down.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;"><span>  1 411</span><img src="Images/Grid/down.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;"><span>  7 311</span><img src="Images/Grid/up.bmp" style="border-width:0px;" /></td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;">   1</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;display:none;">&nbsp;</td><td align="right" valign="middle" style="width:70px;">   560</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td><td align="left" valign="middle" style="width:110px;white-space:nowrap;">GT5:NHON_TRACH_2</td>

答案 2 :(得分:-1)

网站上的数据表:

hour     nation     North Central   South North Central South System
1         16 332    7 367   1 298   7 667   1      1    1       560
2         15 852    7 157   1 477   7 218   1      1    1       560
3         15 575    6 853   1 411   7 311   1      1    1       560
4         15 466    6 839   1 458   7 168   1      1    1       560 
5         15 968    6 969   1 608   7 391   0      1    1       560

我从网站获取数据

1
16 332
7 367
1 298
7 667
1
1
1
&nbsp;
&nbsp;
&nbsp;
560
2
15 852
7 157
1 477
7 218
1
1
1



560
3
15 575
6 853
1 411
7 311
1
1
1



560
15 466
6 839
1 458
7 168
1
1
1



560