以下是HTML示例
<tr id="gift1" class="gift"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg">
</td></tr>
我正在尝试解析其图像网址所代表的对象的价格。
html = getHTML("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html, "html.parser")
print(bsObj.find("img", {"src":"../img/gifts/img1.jpg"}).parent.previous_sibiling)
bsObj.find("img", {"src":"../img/gifts/img1.jpg"}).parent
返回:
<td>
<img src="../img/gifts/img1.jpg">
</td>
但bsObj.find("img", {"src":"../img/gifts/img1.jpg"}).parent.previous_sibiling
始终返回无。
不应该为价格返回<td>
标签吗?
答案 0 :(得分:1)
首先,您有一个拼写错误 - previous_sibling
vs >>> from bs4 import BeautifulSoup
>>>
>>> data = """<tr id="gift1" class="gift"><td>
... Vegetable Basket
... </td><td>
... This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
... <span class="excitingNote">Now with super-colorful bell peppers!</span>
... </td><td>
... $15.00
... </td><td>
... <img src="../img/gifts/img1.jpg">
... </td></tr>"""
>>>
>>> soup = BeautifulSoup(data, "html.parser")
>>>
>>> image_url = "../img/gifts/img1.jpg"
>>>
>>> image = soup.find("img", src=image_url)
>>> price = image.parent.previous_sibling.get_text(strip=True)
>>> print(price)
$15.00
:
$
另一种选择是使用find_previous()
查找以>>> price = image.find_previous(text=lambda text: text and text.strip().startswith("$")).strip()
>>> print(price)
$15.00
开头的文字节点:
android:fitsSystemWindows="true"