Question

以下是HTML示例

<tr id="gift1" class="gift"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg">
</td></tr>

我正在尝试解析其图像网址所代表的对象的价格。

html = getHTML("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html, "html.parser")
print(bsObj.find("img", {"src":"../img/gifts/img1.jpg"}).parent.previous_sibiling)

bsObj.find("img", {"src":"../img/gifts/img1.jpg"}).parent返回：

<td>
<img src="../img/gifts/img1.jpg">
</td>

但bsObj.find("img", {"src":"../img/gifts/img1.jpg"}).parent.previous_sibiling始终返回无。

不应该为价格返回<td>标签吗？

Answer 1

首先，您有一个拼写错误 - previous_sibling vs >>> from bs4 import BeautifulSoup >>> >>> data = """<tr id="gift1" class="gift"><td> ... Vegetable Basket ... </td><td> ... This vegetable basket is the perfect gift for your health conscious (or overweight) friends! ... <span class="excitingNote">Now with super-colorful bell peppers!</span> ... </td><td> ... $15.00 ... </td><td> ... <img src="../img/gifts/img1.jpg"> ... </td></tr>""" >>> >>> soup = BeautifulSoup(data, "html.parser") >>> >>> image_url = "../img/gifts/img1.jpg" >>> >>> image = soup.find("img", src=image_url) >>> price = image.parent.previous_sibling.get_text(strip=True) >>> print(price) $15.00：

另一种选择是使用find_previous()查找以>>> price = image.find_previous(text=lambda text: text and text.strip().startswith("$")).strip() >>> print(price) $15.00开头的文字节点：

android:fitsSystemWindows="true"

BeautifulSoup .previous_sibiling返回None

1 个答案: