我有一些类似的HTML:
...
<table width="100%">
<tr class="blueborder">
<td colspan="2" class="blackbold">Some Other Text</td>
</tr>
</table>
<table width="100%">
<tr class="upcoming">
<td class="lists" >
<ul>
<li> List1 Element1</li>
<li> List1 Element2</li>
<li> List1 Element3</li>
</ul>
</td>
</tr>
</table>
<table width="100%">
<tr class="blueborder">
<td colspan="2" class="blackbold">Signaling Text</td>
</tr>
</table>
<table width="100%">
<tr class="upcoming">
<td class="lists" >
<ul>
<li> List2 Element1</li>
<li> List2 Element2</li>
<li> List2 Element3</li>
</ul>
</td>
</tr>
</table>
...
我使用的是employees = root.xpath('.//td[@class = "lists"]/ul/li/text()')
,但这会抓取两个列表元素。我只想抓住列表2,除了它们具有相同的属性(类等)。唯一的区别是<td colspan="2" class="blackbold">Signaling Text</td>
出现在我想要的列表之前。有没有办法表明在此之后才能获得此列表?
答案 0 :(得分:0)
您可以在tr后面的文本Signaling Text
选择以下td:
h = """ <table width="100%">
<tr class="blueborder">
<td colspan="2" class="blackbold">Some Other Text</td>
</tr>
</table>
<table width="100%">
<tr class="upcoming">
<td class="lists" >
<ul>
<li> List1 Element1</li>
<li> List1 Element2</li>
<li> List1 Element3</li>
</ul>
</td>
</tr>
</table>
<table width="100%">
<tr class="blueborder">
<td colspan="2" class="blackbold">Signaling Text</td>
</tr>
</table>
<table width="100%">
<tr class="upcoming">
<td class="lists" >
<ul>
<li> List2 Element1</li>
<li> List2 Element2</li>
<li> List2 Element3</li>
</ul>
</td>
</tr>
</table> """
from lxml import html
tree = html.fromstring(h)
print(tree.xpath('//td[contains(.,"Signaling Text")]/following::td[@class = "lists"]/ul/li/text()'))
哪会给你:
[' List2 Element1', ' List2 Element2', ' List2 Element3']
或者,如果您确定这是第二次出现:
tree.xpath('(//td[@class = "lists"])[2]/ul/li/text()')