Question

我有以下表结构：

在性别列中，当值存在时，性别会显示在标记之间，但是当标记不存在时，它不会显示，并且值为特殊字符＆amp; nbsp ; < / p>

<TABLE class="first">
   <TR>
      <TD></TD>
      <TD></TD>
      <TD></TD>
      <TD></TD>
   </TR>
   <TR VALIGN="top">
      <TD></TD>
      <TD><DIV>NAME</DIV></TD>
      <TD><DIV>AGE</DIV></TD>
      <TD><DIV>GENDER</DIV></TD>
   </TR>

   <TR VALIGN="top">
      <TD></TD>
      <TD><DIV>MARIA</DIV></TD>
      <TD><DIV>25</DIV></TD>
      <TD><DIV>F</DIV></TD>
   </TR>
    <TR VALIGN="top">
      <TD></TD>
      <TD><DIV>JOHN</DIV></TD>
      <TD><DIV>22</DIV></TD>
      <TD>&nbsp;</TD>
   </TR>
   <TR VALIGN="top">
      <TD></TD>
      <TD><DIV>PAUL</DIV></TD>
      <TD><DIV>36</DIV></TD>
      <TD>&nbsp;</TD>
   </TR>
   <TR VALIGN="top">
      <TD></TD>
      <TD><DIV>DEREK</DIV></TD>
      <TD><DIV>16</DIV></TD>
      <TD><DIV>M</DIV></TD>
   </TR>
</TABLE>

我正在做以下事情：

for table in result.xpath('//table[@class="first"]'):     
    for i, tr in enumerate(table.xpath('//tr')):
        for j, td in enumerate(tr.xpath('td/div/|td')):
              if td.text == '&nbsp;':
                print '---'
              else:
                print td.text

如果td.text中存在＆amp; nbsp 字符，如何打印'---'？

Answer 1

 是不间断空格字符的entity reference（Unicode代码点：U + 00A0）。要测试元素的文本内容是否等于该字符，可以使用：

if td.text == u'\u00A0':

完成演示：

from lxml import html

table = html.parse("table.html")

for tr in table.xpath('//tr'):
    for td in tr.xpath('td/div|td'):
        if td.text == u'\u00A0':
            print 'BLANK VALUE'
        else:
            print td.text

输出：

None
None
None
None
None
None
NAME
None
AGE
None
GENDER
None
None
MARIA
None
25
None
F
None
None
JOHN
None
22
BLANK VALUE
None
None
PAUL
None
36
BLANK VALUE
None
None
DEREK
None
16
None
M

xpath - 在Python中的if else表达式

1 个答案: