HTML Agility Pack xpath表达式帮助

时间:2012-08-15 20:34:46

标签: xpath html-agility-pack

我有以下html:

<table width="98%" border="0" align="center" cellpadding="0" cellspacing="2">
<tr>
    <td height="20" colspan="2">
        &nbsp;
    </td>
</tr>
<tr>
    <td height="20" colspan="2" class="fontDestaque2NegritoHome">
        MATR&Iacute;CULA: PPAAG
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        &nbsp;
    </td>
    <td align="right" valign="middle" class="tx_bd">
        &nbsp;
    </td>
</tr>
<tr>
    <td width="35%" align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Fabricante:</span>
        </div>
    </td>
    <td width="59%" align="right" valign="middle" class="tx_bd">
        <div align="left">
            CESSNA AIRCRAFT</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Modelo:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            T206H</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">N&uacute;mero de S&eacute;rie:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            T20608735</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left" class="tx_bold">
            Tipo ICAO :
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            C206</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Tipo de Habilita&ccedil;&atilde;o para Pilotos:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            MNTE</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Classe da Aeronave:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            POUSO CONVECIONAL 1 MOTOR CONVENCIONAL</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Peso M&aacute;ximo de Decolagem:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            1633 - Kg</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">N&uacute;mero M&aacute;ximo de Passageiros:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            005</div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" background="../images/bgPontilhado.gif"
        class="tx_bd">
        <div align="left">
            <img src="../images/bgPontilhado.gif" width="4" height="1"></div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Categoria de Registro:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            PRIVADA SERVICO AEREO PRIVADOS</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">N&uacute;mero dos Certificados (CM - CA)</span>:
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            19040</div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Situa&ccedil;&atilde;o no RAB:</span><span class="stop_litle">
            </span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="fontRed">ARRENDAMENTO OPERACIONAL/ALIENACAO FIDUCIARIA</span></div>
    </td>
</tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Data da Compra/Transfer&ecirc;ncia:</span>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" background="../images/bgPontilhado.gif"
        class="tx_bd">
        <div align="left">
            <img src="../images/bgPontilhado.gif" width="4" height="1"></div>
    </td>
</tr>
<tr>
    <td colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
        </div>
    </td>
    <tr>
        <td align="right" valign="middle" class="tx_bd">
            <div align="left">
                <span class="tx_bold">Data de Validade do CA: </span>
            </div>
        </td>
        <td width="3%" align="right" valign="middle" class="tx_bd">
            <div align="left">
                <span class="tx_bd">21/05/16</span></div>
        </td>
    </tr>
<tr>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left" class="tx_bold">
            <div align="left">
                Data de Validade da IAM:
            </div>
        </div>
    </td>
    <td align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bd">110513</span></div>
    </td>
</tr>
<tr>
    <td height="18" align="right" valign="middle" class="tx_bd">
        <div align="left">
            <span class="tx_bold">Situa&ccedil;&atilde;o de Aeronavegabilidade:</span>
        </div>
    </td>
    <td height="18" align="right" valign="middle" class="tx_bd">
        <div align="left">
            Normal</div>
    </td>
</tr>
<tr>
    <td height="18" colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left" class="tx_bold">
            Motivo(s):
        </div>
    </td>
</tr>
<tr>
    <td height="18" colspan="2" align="right" valign="middle" class="tx_bd">
        <div align="left">
            <blockquote>
                <p>
                    <span class="tx_bold"></span>
                </p>
            </blockquote>
        </div>
    </td>
</tr>
<tr>
    <td height="18" colspan="2" align="left" valign="middle" class="tx_bd">
        Consulta realizada em: 16/8/2012 15:52:45<br>
    </td>
</tr>

我想抓住以下文字:

  • 塞斯纳飞机
  • T206H
  • T20608735
  • C206
  • MnTe的
  • POUSO CONVECIONAL 1 MOTOR CONVENCIONAL
  • 1633 - Kg
  • 005
  • PRIVADA SERVICO AEREO PRIVADOS
  • 19040
  • ARRENDAMENTO OPERACIONAL / ALIENACAO FIDUCIARIA
  • 21/05/16
  • 110513
  • 正常

有些div只包含我需要的文字。其他div包含一个包含我需要的文本的span。我将如何为此构建xpath?

1 个答案:

答案 0 :(得分:0)

使用

//tr/td[@align='right' and @valign='middle' and @class='tx_bd']
       /div[@align='left 'and not(*)]
         /text()