如何将类似“descendant-AND-self ::”的内容添加到htmlNode中

时间:2013-04-27 09:41:24

标签: c# html-parsing html-agility-pack

我从这个html表解析:

<table align="center">
   <tbody>
      <!-- riadok -->
      <tr>
         <td valign="middle" align="right">
            <form action="130427_0i.htm" method="get">
               <input type="submit" class="button" title="uvedení do první modlitby dne" value="Inv.">
            </form>
         </td>
         <td valign="middle" align="center">
            <form action="130427_0c.htm" method="get">
               <input type="submit" class="button" title="modlitba se čtením" value="Čtení">
            </form>
         </td>
         <td valign="middle" align="left">
            <form action="130427_0r.htm" method="get">
               <input type="submit" class="button" title="ranní chvály" value="Ranní chvály">
            </form>
         </td>
      </tr>
      <!-- riadok -->
      <tr>
         <td valign="middle" align="right">
            <form action="130427_09.htm" method="get">
               <input type="submit" class="button" title="modlitba dopoledne" value="9h">
            </form>
            <form action="130427_09d.htm" method="get">
               <input type="submit" class="button" title="modlitba dopoledne (žalmy z doplňovacího cyklu)" value="(alt)">
            </form>
         </td>
         <td valign="middle" align="center">
            <form action="130427_02.htm" method="get">
               <input type="submit" class="button" title="modlitba v poledne" value="12h">
            </form>
            <form action="130427_02d.htm" method="get">
               <input type="submit" class="button" title="modlitba v poledne (žalmy z doplňovacího cyklu)" value="(alt)">
            </form>
         </td>
         <td valign="middle" align="left">
            <form action="130427_03.htm" method="get">
               <input type="submit" class="button" title="modlitba odpoledne" value="15h">
            </form>
            <form action="130427_03d.htm" method="get">
               <input type="submit" class="button" title="modlitba odpoledne (žalmy z doplňovacího cyklu)" value="(alt)">
            </form>
         </td>
      </tr>
      <!-- riadok -->
      <tr>
         <td align="right">
            <form action="130427_0v.htm" method="get">
               <input type="submit" class="button" title="nešpory" value="Nešpory">
            </form>
         </td>
         <td valign="middle" align="center">
            <form action="130427_0k.htm" method="get">
               <input type="submit" class="button" title="kompletář" value="Kompl.">
            </form>
         </td>
      </tr>
      <!-- riadok -->
      <tr>
         <td align="right"></td>
      </tr>
   </tbody>
</table>

我需要在一个HtmlNode中获取所有表单(带输入)。 例如:

<form action="130427_0c.htm" method="get">
               <input type="submit" class="button" title="modlitba se čtením" value="Čtení">
 </form>

使用我的代码我只得到这个:

<form action="130427_0c.htm" method="get">

我的代码:

public static class FromHtmlTableToHtmlNodeList
    {
        static List<List<HtmlNode>> tableOfNode = new List<List<HtmlNode>>();

        public static List<List<HtmlNode>> Do(string htmltable)
        {
            var doc = new HtmlDocument();
            doc.LoadHtml(htmltable);

            HtmlNodeCollection rows = doc.DocumentNode.SelectNodes(".//tr");
            for (int i = 0; i < rows.Count; i++)
            {
                int i2 = tableOfNode.Count;
                HtmlNodeCollection cols = rows[i].SelectNodes("./td");

                for (int j = 0; j < cols.Count; j++)
                {

                    HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");
                    List<HtmlNode> nextRow = new List<HtmlNode>();

                    if (inCols != null)
                    {
                        for (int k = 0; k < inCols.Count; k++)
                        {
                            if (tableOfNode.Count < i2+k + 1)
                            {
                                tableOfNode.Add(nextRow);

                            }
                            if (tableOfNode[i2 + k].Count < j + 1) tableOfNode[i2 + k].Insert(j, inCols[k]);

                        }
                    }                                   
                }


            }

            return tableOfNode;
        }



    }

我知道问题存在:

HtmlNodeCollection inCols = cols[j].SelectNodes("./form/descendant-or-self::*");

XPath应该如何满足我的需求?

2 个答案:

答案 0 :(得分:0)

您正在寻找XPath表达式

./form[input]

这将返回所有<form/>元素,包括其子树,其中包含至少一个<input/>元素。

答案 1 :(得分:0)

默认情况下,Html Agility Pack会对FORM进行特殊处理。请参阅此处原因:HtmlAgilityPack -- Does <form> close itself for some reason?

此代码应获取所有FORM元素:

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("form");
doc.Load(myTestHtm);

foreach (var v in doc.DocumentNode.SelectNodes("//form"))
{
    Console.WriteLine(v.OuterHtml);
}