我试图找到所有h3 class =" threadtitle"在这个元素中,如果有文字" NSW"返回<的值一个>元件。
<h3 class="threadtitle">
<img border="0" alt="MARKET PLACE/AUCTIONS" src="vbcover/ibid/images/auction_open.png" title="MARKET PLACE/AUCTIONS">
<span class="prefix understate">
<b>
<font size="2" face="arial" color="#0000FF">NSW</font>
</b>
</span>
<a id="thread_title_1234" class="title" href="showthread.php?t=1234">Banana man</a>
</h3>
这是我到目前为止所做的: 我可以找到这样的单个元素:
import requests
from lxml import etree, html
response '''
<h3 class="threadtitle">
<img border="0" alt="MARKET PLACE/AUCTIONS" src="vbcover/ibid/images/auction_open.png" title="MARKET PLACE/AUCTIONS">
<span class="prefix understate">
<b>
<font size="2" face="arial" color="#0000FF">NSW</font>
</b>
</span>
<a id="thread_title_1234" class="title" href="showthread.php?t=1234">Banana man</a>
</h3>
'''
tree = html.fromstring(response.text)
test = tree.xpath("//font[text()='NSW']")
#or
test2 = tree.xpath("//h3[@class='threadtitle']")
for i in test:
print i
NSW
但我不知道如何将这些结合起来。
上面的例子应该返回香蕉人&#39;
答案 0 :(得分:2)
试试这个xpath:
//h3[@class='threadtitle'][descendant::font/text() = 'NSW']/a/text()