即使存在,节点也始终为空

时间:2016-06-06 11:03:21

标签: html xpath visual-studio-2015 html-agility-pack

我正在开展一个小项目,从我的计算机上的HTML文件中提取表格数据,然后我将它们放在Excel表格中。我将使用此数据保存到数据库中。我唯一坚持的部分是HTML解析。我的代码如下。我从Firefox的Firebug扩展中获得了XPath。如果您想查看它,我会将完整的HTML文件上传到Dropbox。

Download File Here

OpenFileDialog dosyaSec = new OpenFileDialog();
dosyaSec.FileName = "*.HTML";
if (dosyaSec.ShowDialog() == DialogResult.OK)
{
    HtmlAgilityPack.HtmlDocument myHtml = new HtmlAgilityPack.HtmlDocument();
    myHtml.LoadHtml(dosyaSec.FileName);

    HtmlNode table = myHtml.DocumentNode.SelectSingleNode("//table[6]"); //table returns null here
    if (table != null)
    {
        foreach (var cell in table.SelectNodes(".//tr//td/"))
        {
            //will deal with this later
        }
    }

}

部分HTML代码如下所示:

<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
<title>
   ToPs 100 - SET-UP SCHEDULE /
   L3020 -
   1 /
   4MM_ST52_52315_120
</title>
<meta name="author" content="User" />
<meta name="keywords" content="L3020,
TYPE:1, Trumpf" />
</head>
<!--body bgcolor="#D0D0D0"-->
<body link="#0000ff" vlink="#800080">
<basefont face="Arial" size="1" />



<table width="600" border="1" cellspacing="1" cellpadding="0">
   <tr>
      <td colspan="4" align="left">
         <!--Ueberschrift Einzelteilinformationen-->
         <font size="4"><b>INFORMATION ON SINGLE PART&nbsp;</b></font>
      </td>
   </tr>
   <tr>
      <td valign="top"><font size="2"><b>PART NUMBER:&nbsp;</b></font></td>
      <td valign="top"><font size="2"><b>DRAWING NUMBER:&nbsp;</b></font></td>
      <td valign="top"><font size="2"><b>GEOFILE NAME:&nbsp;</b></font></td>
      <td valign="top"><font size="2"><b>NUMBER:&nbsp;</b></font></td>
   </tr>
<tr><td valign="top"><font size="2">3&nbsp;</font></td><td valign="top"><font size="2">NOID_3&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\53 RS\53203\53203156\53203156-1-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">34&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">5&nbsp;</font></td><td valign="top"><font size="2">NOID_5&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\53 RS\53203\53203156\53203156-3-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">42&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">2&nbsp;</font></td><td valign="top"><font size="2">NOID_2&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\HENDRICKSON\HS508447-48\HS508453-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">1&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">1&nbsp;</font></td><td valign="top"><font size="2">NOID_1&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\EGE ENDÜSTRÝ\10055006\10055003-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">46&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">4&nbsp;</font></td><td valign="top"><font size="2">NOID_4&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\53 RS\53203\53203156\53203156-2-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">67&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">10&nbsp;</font></td><td valign="top"><font size="2">NOID_10&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311344-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">64&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">6&nbsp;</font></td><td valign="top"><font size="2">NOID_6&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\51 RS\51400\51400266\51400265_4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">3&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">9&nbsp;</font></td><td valign="top"><font size="2">NOID_9&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311341-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">68&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">8&nbsp;</font></td><td valign="top"><font size="2">NOID_8&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311340-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">64&nbsp;</font></td></tr>
<tr><td valign="top"><font size="2">7&nbsp;</font></td><td valign="top"><font size="2">NOID_7&nbsp;</font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311091-4MM.GEO&nbsp;</font></td><td valign="top"><font size="2">61&nbsp;</font></td></tr>

<tr>
      <td colspan="4" align="left">
         <!--Tafelname mit -pfad-->
         <font size="2">
         SHEET NAME:&nbsp;
         F:\LA...ÝM\TAF DOSYALARI\4MM_ST52_52315_1200X3000.taf&nbsp;
         </font>
      </td>
</tr>
</table>

</body>
</html>

顺便说一下HTML很长,所以我只粘贴了属于第六个表的代码。表格没有id s。

1 个答案:

答案 0 :(得分:1)

您需要使用Load的{​​{1}}方法加载文件。 HtmlDocument需要一个HTML字符串。

来自:http://htmlagilitypack.codeplex.com/wikipage?title=Examples(强调我的)

Html Agility Pack示例

例如,以下是如何修复HTML 文件中的所有href:

LoadHtml

该项目已移至:http://html-agility-pack.net