我从这个html表解析:
<table align="center">
<tbody>
<!-- riadok -->
<tr>
<td valign="middle" align="right">
<form action="130427_0i.htm" method="get">
<input type="submit" class="button" title="uvedení do první modlitby dne" value="Inv.">
</form>
</td>
<td valign="middle" align="center">
<form action="130427_0c.htm" method="get">
<input type="submit" class="button" title="modlitba se čtením" value="Čtení">
</form>
</td>
<td valign="middle" align="left">
<form action="130427_0r.htm" method="get">
<input type="submit" class="button" title="ranní chvály" value="Ranní chvály">
</form>
</td>
</tr>
<!-- riadok -->
<tr>
<td valign="middle" align="right">
<form action="130427_09.htm" method="get">
<input type="submit" class="button" title="modlitba dopoledne" value="9h">
</form>
<form action="130427_09d.htm" method="get">
<input type="submit" class="button" title="modlitba dopoledne (žalmy z doplňovacího cyklu)" value="(alt)">
</form>
</td>
<td valign="middle" align="center">
<form action="130427_02.htm" method="get">
<input type="submit" class="button" title="modlitba v poledne" value="12h">
</form>
<form action="130427_02d.htm" method="get">
<input type="submit" class="button" title="modlitba v poledne (žalmy z doplňovacího cyklu)" value="(alt)">
</form>
</td>
<td valign="middle" align="left">
<form action="130427_03.htm" method="get">
<input type="submit" class="button" title="modlitba odpoledne" value="15h">
</form>
<form action="130427_03d.htm" method="get">
<input type="submit" class="button" title="modlitba odpoledne (žalmy z doplňovacího cyklu)" value="(alt)">
</form>
</td>
</tr>
<!-- riadok -->
<tr>
<td align="right">
<form action="130427_0v.htm" method="get">
<input type="submit" class="button" title="nešpory" value="Nešpory">
</form>
</td>
<td valign="middle" align="center">
<form action="130427_0k.htm" method="get">
<input type="submit" class="button" title="kompletář" value="Kompl.">
</form>
</td>
</tr>
<!-- riadok -->
<tr>
<td align="right"></td>
</tr>
</tbody>
</table>
我需要在一个HtmlNode中获取所有表单(带输入)。 例如:
<form action="130427_0c.htm" method="get">
<input type="submit" class="button" title="modlitba se čtením" value="Čtení">
</form>
使用我的代码我只得到这个:
<form action="130427_0c.htm" method="get">
我的代码:
public static class FromHtmlTableToHtmlNodeList
{
static List<List<HtmlNode>> tableOfNode = new List<List<HtmlNode>>();
public static List<List<HtmlNode>> Do(string htmltable)
{
var doc = new HtmlDocument();
doc.LoadHtml(htmltable);
HtmlNodeCollection rows = doc.DocumentNode.SelectNodes(".//tr");
for (int i = 0; i < rows.Count; i++)
{
int i2 = tableOfNode.Count;
HtmlNodeCollection cols = rows[i].SelectNodes("./td");
for (int j = 0; j < cols.Count; j++)
{
HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");
List<HtmlNode> nextRow = new List<HtmlNode>();
if (inCols != null)
{
for (int k = 0; k < inCols.Count; k++)
{
if (tableOfNode.Count < i2+k + 1)
{
tableOfNode.Add(nextRow);
}
if (tableOfNode[i2 + k].Count < j + 1) tableOfNode[i2 + k].Insert(j, inCols[k]);
}
}
}
}
return tableOfNode;
}
}
我知道问题存在:
HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");
XPath应该如何满足我的需求?
答案 0 :(得分:0)
您正在寻找XPath表达式
./form[input]
这将返回所有<form/>
元素,包括其子树,其中包含至少一个<input/>
元素。
答案 1 :(得分:0)
默认情况下,Html Agility Pack会对FORM进行特殊处理。请参阅此处原因:HtmlAgilityPack -- Does <form> close itself for some reason?
此代码应获取所有FORM元素:
HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("form");
doc.Load(myTestHtm);
foreach (var v in doc.DocumentNode.SelectNodes("//form"))
{
Console.WriteLine(v.OuterHtml);
}