Question

我正在使用lxml，使用库的html模块。

如何获取包含具有某些特征的元素的元素？

例如：

<TR>
  <TD>Welcome</TD>
  <TD>other</TD>
</TR>
<TR>
  <TD>Bye Bye</TD>
  <TD>another</TD>
</TR>

如何选择包含<TR>的{{1}}元素？不确定如何写下正确的<TD>Welcome</TD>模式

Answer 1

如果你想使用XPath，这应该有效：

e = doc.xpath('//tr[td[text()="Welcome"]]')[0]

Answer 2

有很多方法可以做到这一点。我对xPath不是很熟练所以我会这样做

myTree = html.fromstring(open(somePath to my htmlfile).read())
rows = [ e for e in myTree if e.tag == 'tr']
for row in rows:
    cells = [e for e in row if e.tag == 'td']
    for cell in cells:
        if cell.text_content = 'Welcome'
        print ' I have the row I want'
        break

当程序中断行时，您将拥有包含单词Welcome的行的第一行。你可以修改一下这个。也就是说，如果在光标处键入row，那么将显示的行元素就是您的特定行

选择其子/孙子/ ..包含具有指定模式的元素的元素

2 个答案: