我正在开展一个小项目,从我的计算机上的HTML文件中提取表格数据,然后我将它们放在Excel表格中。我将使用此数据保存到数据库中。我唯一坚持的部分是HTML解析。我的代码如下。我从Firefox的Firebug扩展中获得了XPath。如果您想查看它,我会将完整的HTML文件上传到Dropbox。
OpenFileDialog dosyaSec = new OpenFileDialog();
dosyaSec.FileName = "*.HTML";
if (dosyaSec.ShowDialog() == DialogResult.OK)
{
HtmlAgilityPack.HtmlDocument myHtml = new HtmlAgilityPack.HtmlDocument();
myHtml.LoadHtml(dosyaSec.FileName);
HtmlNode table = myHtml.DocumentNode.SelectSingleNode("//table[6]"); //table returns null here
if (table != null)
{
foreach (var cell in table.SelectNodes(".//tr//td/"))
{
//will deal with this later
}
}
}
部分HTML代码如下所示:
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />
<title>
ToPs 100 - SET-UP SCHEDULE /
L3020 -
1 /
4MM_ST52_52315_120
</title>
<meta name="author" content="User" />
<meta name="keywords" content="L3020,
TYPE:1, Trumpf" />
</head>
<!--body bgcolor="#D0D0D0"-->
<body link="#0000ff" vlink="#800080">
<basefont face="Arial" size="1" />
<table width="600" border="1" cellspacing="1" cellpadding="0">
<tr>
<td colspan="4" align="left">
<!--Ueberschrift Einzelteilinformationen-->
<font size="4"><b>INFORMATION ON SINGLE PART </b></font>
</td>
</tr>
<tr>
<td valign="top"><font size="2"><b>PART NUMBER: </b></font></td>
<td valign="top"><font size="2"><b>DRAWING NUMBER: </b></font></td>
<td valign="top"><font size="2"><b>GEOFILE NAME: </b></font></td>
<td valign="top"><font size="2"><b>NUMBER: </b></font></td>
</tr>
<tr><td valign="top"><font size="2">3 </font></td><td valign="top"><font size="2">NOID_3 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\53 RS\53203\53203156\53203156-1-4MM.GEO </font></td><td valign="top"><font size="2">34 </font></td></tr>
<tr><td valign="top"><font size="2">5 </font></td><td valign="top"><font size="2">NOID_5 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\53 RS\53203\53203156\53203156-3-4MM.GEO </font></td><td valign="top"><font size="2">42 </font></td></tr>
<tr><td valign="top"><font size="2">2 </font></td><td valign="top"><font size="2">NOID_2 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\HENDRICKSON\HS508447-48\HS508453-4MM.GEO </font></td><td valign="top"><font size="2">1 </font></td></tr>
<tr><td valign="top"><font size="2">1 </font></td><td valign="top"><font size="2">NOID_1 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\EGE ENDÜSTRÝ\10055006\10055003-4MM.GEO </font></td><td valign="top"><font size="2">46 </font></td></tr>
<tr><td valign="top"><font size="2">4 </font></td><td valign="top"><font size="2">NOID_4 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\53 RS\53203\53203156\53203156-2-4MM.GEO </font></td><td valign="top"><font size="2">67 </font></td></tr>
<tr><td valign="top"><font size="2">10 </font></td><td valign="top"><font size="2">NOID_10 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311344-4MM.GEO </font></td><td valign="top"><font size="2">64 </font></td></tr>
<tr><td valign="top"><font size="2">6 </font></td><td valign="top"><font size="2">NOID_6 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\51 RS\51400\51400266\51400265_4MM.GEO </font></td><td valign="top"><font size="2">3 </font></td></tr>
<tr><td valign="top"><font size="2">9 </font></td><td valign="top"><font size="2">NOID_9 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311341-4MM.GEO </font></td><td valign="top"><font size="2">68 </font></td></tr>
<tr><td valign="top"><font size="2">8 </font></td><td valign="top"><font size="2">NOID_8 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311340-4MM.GEO </font></td><td valign="top"><font size="2">64 </font></td></tr>
<tr><td valign="top"><font size="2">7 </font></td><td valign="top"><font size="2">NOID_7 </font></td><td valign="top"><font size="2">F:\LAZER KESÝM\BMC AÞ\57 RS\57311\57311071\57311091-4MM.GEO </font></td><td valign="top"><font size="2">61 </font></td></tr>
<tr>
<td colspan="4" align="left">
<!--Tafelname mit -pfad-->
<font size="2">
SHEET NAME:
F:\LA...ÝM\TAF DOSYALARI\4MM_ST52_52315_1200X3000.taf
</font>
</td>
</tr>
</table>
</body>
</html>
顺便说一下HTML很长,所以我只粘贴了属于第六个表的代码。表格没有id
s。
答案 0 :(得分:1)
您需要使用Load
的{{1}}方法加载文件。 HtmlDocument
需要一个HTML字符串。
来自:http://htmlagilitypack.codeplex.com/wikipage?title=Examples(强调我的)
例如,以下是如何修复HTML 文件中的所有href::
LoadHtml
该项目已移至:http://html-agility-pack.net