解析具有不同行号的HTML表

时间:2019-02-09 21:22:44

标签: c# html-agility-pack

我正在尝试解析HTML表,但是表在具有不同行号的行中并不相等,在(form)下的所有表我都将(form)选择为SingleNode,但是(tbody)却不是行( td),我无法全部播放(td)。

部分HTML代码:

<form name="DetailsForm" method="post" action="">
  <input type="hidden" name="helpPageId" value="WF03">
    <input type="hidden" name="withMenu" value="1">
      <table width="100%" cellspacing="0" border="0">
        <tbody>
          <tr valign="center">
            <td class="blackHeadingLeft">Details</td>
          </tr>
          <tr></tr>
          <tr>
            <td></td>
          </tr>
        </tbody>
      </table>
      <table width="100%" cellspacing="0" border="0">
        <tbody>
          <tr>
            <td class="whiteTd" height="21">&nbsp;AWB:</td>
            <td class="whiteTdNormal" nowrap="nowrap" height="21">&nbsp; 7777995585 </td>
            <td class="whiteTd" nowrap="nowrap" height="21">&nbsp;No of Shipment Details:</td>
            <td class="whiteTdNormal" nowrap="nowrap" height="21">&nbsp; 1 </td>
            <td class="whiteTdNormal" width="100%" height="21">&nbsp;</td>
          </tr>
        </tbody>
      </table>
      <table class="bordered-table" width="100%" border="0">
        <tbody>
          <tr>
            <td class="grayTd" width="5%" height="21">&nbsp;Details</td>
            <td class="grayTd" width="5%" height="21" align="center">&nbsp;Orig</td>
            <td class="grayTd" width="8%" height="21" align="center">&nbsp;Location</td>
            <td class="grayTd" width="7%" height="21">&nbsp;Dest</td>
            <td class="grayTd" width="5%" height="21" align="center">&nbsp;Pcs</td>
            <td class="grayTd" width="5%" height="21">&nbsp;Weight(kg)</td>
            <td class="grayTd" width="11%" height="21">&nbsp;Volumetric Weight(kg)</td>
            <td class="grayTd" width="9%" height="21">&nbsp;Date/Time</td>
            <td class="grayTd" width="8%" height="21">&nbsp;Route/Cycle</td>
            <td class="grayTd" width="8%" height="21">&nbsp;Post Code</td>
            <td class="grayTd" width="6%" height="21">&nbsp;Product</td>
            <td class="grayTd" width="9%" height="21">&nbsp;Amount</td>
            <td class="grayTd" width="9%" height="21">&nbsp;Duplicate</td>
          </tr>

1 个答案:

答案 0 :(得分:0)

这是我能够做到的方式:

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);

        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {
            Console.WriteLine("Table: ");
            foreach (HtmlNode tbody in table.SelectNodes("tbody"))
            {
                if (tbody.ChildNodes.Any(x => x.Name == "tr"))
                {
                    Console.WriteLine("TBody: ");
                    foreach (HtmlNode cell in tbody.SelectNodes("tr"))
                    {
                        Console.WriteLine("TR: ");
                        if (cell.ChildNodes.Any(c => c.Name == "td"))
                        {
                            foreach (var item in cell.SelectNodes("td"))
                            {
                                Console.WriteLine("TD: ");
                                Console.WriteLine(item.InnerHtml);
                            }
                        }

                        Console.WriteLine();
                    }
                }
            }
        }

这样,有多少个tr或td标签并不重要。需要注意的一件事是,如果在tbody中没有tr或td标签的情况下,您必须添加验证。

我希望这会有所帮助。


经过编辑,包括对tr和td标签的验证。类似的逻辑可以用于可能丢失的所有其他标签。