使用HTML Agility Pack解析HTML表格 - 表格没有ID

时间:2013-12-12 23:33:05

标签: c# html-agility-pack

这是我的Html代码

<center>
    <table cellspacing="0" cellpadding="0" border="0">
    <tbody><tr><td><img src="/someimages/images/dot_t.gif" hspace="20"></td>
     <td><font face="Arial Rounded MT Bold, Arial, Helvetica" size="5" color="#000088">
     Marks Sheet Page</font></td>
    </tr>
    </tbody></table>
    <table>
    <tbody><tr>
    <td>
    <table border="0" cellpadding="0" cellspacing="0">
    <tbody><tr><td><img src="/someimages/images/dot_t.gif" hspace="15"></td><td align="CENTER">

    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600">

    <tbody><tr bgcolor="#FF6600">
     <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Name</font></th>
     <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Account</font></th>
     <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Postal&nbsp;Address</font></th>
     <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Town</font></th>
     <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Zip&nbsp;Code</font></th>
     <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Weather&nbsp;Turn-Off</font></th>
     <th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Next&nbsp;Weather&nbsp;Sample&nbsp;Date</font></th>

    <!-- IF NOT ValidateDB2Acct(request("AcctNo")) THEN         response.write "<font FACE='Arial Rounded MT Bold, Arial, Helvetica' SIZE='3' COLOR='#000088'></font>"      ELSE // -->
    </tr>       
    <tr><td align="CENTER">Company Name</td><td align="CENTER">1212121212121212</td><td align="CENTER">Street Addr Ln&amp;P</td><td align="CENTER">NEW YORK NY</td><td align="CENTER">10075</td><td align="CENTER">N</td><td align="CENTER">12/19/2013</td></tr></tbody></table><br>
    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"><tbody><tr bgcolor="#FF6600"><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Break Code</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Local Variable</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">PLPLPL</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">MOM CODE</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Exam %</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Exam Area</font></th></tr><tr><td align="CENTER">L</td><td align="CENTER">20.5</td><td align="CENTER">&nbsp; 21.5629</td><td align="CENTER">&nbsp; --</td><td align="CENTER">100</td><td align="CENTER">&nbsp; J</td></tr></tbody></table>

    <br>

    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"><tbody><tr bgcolor="#FF6600"><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Route Number</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Airline Class</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Earlier Account Number</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">On Monthly Xfin</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">WHO Code</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Profile</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">ITIN</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">Municipal</font></th></tr><tr><td align="CENTER">13</td><td align="CENTER">9</td><td align="CENTER">00000000000000</td><td align="CENTER">21</td><td align="CENTER">50</td><td align="CENTER">N</td><td align="CENTER">Fully Taxable</td><td align="CENTER">--</td></tr></tbody></table></td></tr><tr><td colspan="2"><img src="/someimages/images/dot_t.gif" vspace="10"></td></tr><tr><td></td><td align="CENTER">

    <table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600"><tbody><tr bgcolor="#FF6600"><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">From Date</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">To Date</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">Bytes</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">KBB</font></th><th><font face="Arial Rounded MT Bold, Arial, Helvetica" size="3" color="#000088">Bill Amt</font></th></tr><tr><td align="CENTER">10/18/2013</td><td align="CENTER">11/18/2013</td><td align="RIGHT">7160</td><td align="RIGHT">17.60</td><td align="RIGHT">$671.46</td></tr><tr><td align="CENTER">9/18/2013</td><td align="CENTER">10/18/2013</td><td align="RIGHT">6800</td><td align="RIGHT">15.60</td><td align="RIGHT">$654.78</td></tr><tr><td align="CENTER">8/19/2013</td><td align="CENTER">9/18/2013</td><td align="RIGHT">8120</td><td align="RIGHT">18.00</td><td align="RIGHT">$811.63</td></tr><tr><td align="CENTER">7/19/2013</td><td align="CENTER">8/19/2013</td><td align="RIGHT">8320</td><td align="RIGHT">19.60</td><td align="RIGHT">$856.76</td></tr><tr><td align="CENTER">6/19/2013</td><td align="CENTER">7/19/2013</td><td align="RIGHT">9480</td><td align="RIGHT">21.60</td><td align="RIGHT">$988.60</td></tr><tr><td align="CENTER">5/20/2013</td><td align="CENTER">6/19/2013</td><td align="RIGHT">7680</td><td align="RIGHT">20.40</td><td align="RIGHT">$854.82</td></tr><tr><td align="CENTER">4/19/2013</td><td align="CENTER">5/20/2013</td><td align="RIGHT">7040</td><td align="RIGHT">17.60</td><td align="RIGHT">$746.32</td></tr><tr><td align="CENTER">3/21/2013</td><td align="CENTER">4/19/2013</td><td align="RIGHT">6800</td><td align="RIGHT">18.00</td><td align="RIGHT">$688.43</td></tr><tr><td align="CENTER">1/18/2013</td><td align="CENTER">3/21/2013</td><td align="RIGHT">15360</td><td align="RIGHT">18.00</td><td align="RIGHT">$1,456.56</td></tr><tr><td align="CENTER">12/19/2012</td><td align="CENTER">1/18/2013</td><td align="RIGHT">7280</td><td align="RIGHT">16.40</td><td align="RIGHT">$718.47</td></tr><tr><td align="CENTER">11/16/2012</td><td align="CENTER">12/19/2012</td><td align="RIGHT">8040</td><td align="RIGHT">17.60</td><td align="RIGHT">$848.67</td></tr><tr><td align="CENTER">10/18/2012</td><td align="CENTER">11/16/2012</td><td align="RIGHT">6800</td><td align="RIGHT">16.80</td><td align="RIGHT">$681.44</td></tr><tr><td align="CENTER">9/18/2012</td><td align="CENTER">10/18/2012</td><td align="RIGHT">7120</td><td align="RIGHT">18.40</td><td align="RIGHT">$757.94</td></tr><tr><td align="CENTER">8/17/2012</td><td align="CENTER">9/18/2012</td><td align="RIGHT">9160</td><td align="RIGHT">20.40</td><td align="RIGHT">$1,000.89</td></tr><tr><td align="CENTER">7/19/2012</td><td align="CENTER">8/17/2012</td><td align="RIGHT">9040</td><td align="RIGHT">20.00</td><td align="RIGHT">$884.61</td></tr><tr><td align="CENTER">6/19/2012</td><td align="CENTER">7/19/2012</td><td align="RIGHT">9320</td><td align="RIGHT">18.80</td><td align="RIGHT">$928.98</td></tr><tr><td align="CENTER">5/18/2012</td><td align="CENTER">6/19/2012</td><td align="RIGHT">7520</td><td align="RIGHT">16.40</td><td align="RIGHT">$788.95</td></tr><tr><td align="CENTER">4/19/2012</td><td align="CENTER">5/18/2012</td><td align="RIGHT">6280</td><td align="RIGHT">14.80</td><td align="RIGHT">$665.93</td></tr><tr><td align="CENTER">3/21/2012</td><td align="CENTER">4/19/2012</td><td align="RIGHT">6240</td><td align="RIGHT">17.20</td><td align="RIGHT">$725.73</td></tr><tr><td align="CENTER">2/21/2012</td><td align="CENTER">3/21/2012</td><td align="RIGHT">6640</td><td align="RIGHT">16.80</td><td align="RIGHT">$1,213.52</td></tr><tr><td align="CENTER">1/20/2012</td><td align="CENTER">2/21/2012</td><td align="RIGHT">7640</td><td align="RIGHT">18.40</td><td align="RIGHT">$1,347.25</td></tr><tr><td align="CENTER">12/20/2011</td><td align="CENTER">1/20/2012</td><td align="RIGHT">7600</td><td align="RIGHT">16.00</td><td align="RIGHT">$1,353.32</td></tr><tr><td align="CENTER">11/17/2011</td><td align="CENTER">12/20/2011</td><td align="RIGHT">7880</td><td align="RIGHT">17.60</td><td align="RIGHT">$1,307.75</td></tr></tbody></table><br>
    </td></tr></tbody></table>
<!-- END PAGE CONTENT AREA -->
<!-- ***** to here ***** -->

<!--include this footer on every page-->
<!-- BEGIN PAGE FOOTER AREA -->

<table width="100%">
<tbody><tr><td width="40"><img src="/someimages/images/dot_t.gif" hspace="20" vspace="20"></td>
<td valign="top" align="CENTER"><br><br>
<font face="Arial Rounded MT Bold, Arial, Helvetica" size="2" color="#000088">
Contact Us at <a href="mailto:email@sample.com">email@sample.com</a></font></td></tr>
</tbody></table>
<!-- END PAGE FOOTER AREA -->
</td></tr></tbody></table></center>

它重复了同样的表

<table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600">

中包含<tbody><tbody><tr>个标记。第一个<tr>是标题/标题,第二个<tr>标记是内容/值

我正在尝试将这些表转换为DataTables

如何编写XPath代码以便分割4个html表(具有实际内容)?

到目前为止,我尝试过,但我收到此代码的错误

HtmlDocument myHtml = new HtmlDocument();
myHtml.LoadHtml(stringHTML);

ParseAllTables(myHtml);

private static DataTable[] ParseAllTables(HtmlDocument doc)
{
    var result = new List<DataTable>();
    foreach (var table in doc.DocumentNode.Descendants("table"))
    {
        result.Add(ParseTable(table));
    }
    return result.ToArray();
}

private static DataTable ParseTable(HtmlNode table)
{
    var result = new DataTable();

    var rows = table.Descendants("tr");

    var header = rows.Take(1).First();
    foreach (var column in header.Descendants("td"))
    {
        result.Columns.Add(new DataColumn(column.InnerText, typeof(string)));
    }

    foreach (var row in rows.Skip(1))
    {
        var data = new List<string>();
        foreach (var column in row.Descendants("td"))
        {
            data.Add(column.InnerText);
        }
        result.Rows.Add(data.ToArray());
    }
    return result;
}

我只关心表<TR>

中的<table border="1" cellpadding="3" bordercolordark="#993300" bordercolorlight="#FF6600">个标签

第一个<TR>是DataTable标头 从第二个<TR>开始是DataTable数据

1 个答案:

答案 0 :(得分:0)

这很有效,

//table[@cellpadding='0' and @cellspacing='0']/tr[1]