我正在尝试从该网站获取信息 https://www.realtypro.co.za/property_detail.php?ref=1736
我有这张桌子,我想从上面拿起卧室的数量
<div class="panel panel-primary">
<div class="panel-heading">Property Details</div>
<div class="panel-body">
<table width="100%" cellpadding="0" cellspacing="0" border="0" class="table table-striped table-condensed table-tweak">
<tbody><tr>
<td class="xh-highlight">3</td><td style="width: 140px" class="">Bedrooms</td>
</tr>
<tr>
<td>Bathrooms</td>
<td>3</td>
</tr>
我正在使用以下xpath表达式:
bedrooms = response.xpath("//div[@class='panel panel-primary']/div[@class='panel-body']/table[@class='table table-striped table-condensed table-tweak']/tbody/tr[1]/td[2]/text()").extract_first()
但是,我只得到'None'作为输出。
我尝试了几种组合,但只得到None作为输出。关于我做错了什么建议吗?
谢谢!
答案 0 :(得分:3)
我将使用bs4 4.7.1。在这里您可以用:contains
搜索具有文本td
的{{1}}单元格,然后取相邻的兄弟姐妹"Bedrooms"
。您可以为td
添加测试以进行错误处理。比长的xpath脆弱。
is None
如果位置固定,则可以使用
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.realtypro.co.za/property_detail.php?ref=1736')
soup = bs(r.content, 'lxml')
print(int(soup.select_one('td:contains(Bedrooms) + td').text)
答案 1 :(得分:1)
尝试一下,让我知道它是否有效:
headerView = UIView(frame: CGRect(x: 0, y: 0, width: self.bounds.width, height: height))
tableView.addImageHeaderView(headerView: headerView, height: 0)
输出:
['3']
编辑:
或者可能:
import lxml.html
response = [your code above]
beds = lxml.html.fromstring(response)
bedrooms = beds.xpath("//div[@class='panel panel-primary']/div[@class='panel-body']/table[@class='table table-striped table-condensed table-tweak']/tbody/tr[1]/td[2]//preceding-sibling::*/text()")
bedrooms