BS4无法选择正确的“跨度”

时间:2020-11-04 19:08:41

标签: python python-3.x web-scraping beautifulsoup

我试图从某个网站上获取价格,以下是HTML代码的一小部分:

</div>
</div>
<div class="right custom">
<div class="description custom">
<aside>
<h4>Availability:</h4>
<div>
<span class="label green">In Stock</span>
</div>
</aside>
<aside>
<h4>Price:</h4>
<div>
<span class="label">£65.40</span>
</div>
</aside>
<aside>
<h4>Ex Tax:</h4>
<div>
<span class="label">£54.50</span>
</div>
</aside>
<div class="price">
                    £65.40                  </div>
<section class="custom-order">
<div class="options">
<div class="option" id="option-276">
<span class="required">*</span>
<label>Type &amp; Extras:</label><br/>
<select name="option[276]">
<option value=""> --- Please Select --- </option>
<option value="146">Each                                </option>
</select>
</div>
</div>
<div class="quantity custom">
<label>Quantity:</label><br/>
<input name="quantity" size="2" type="text" value="1"/>
</div>
</section>
<!-- -->
<div class="cart">
<div>

我试图选择54.50英镑的价格(不含英国税的价格)。

我使用的代码如下:

import requests
from bs4 import BeautifulSoup
import pandas as pd

var1 = requests.get("https://www.website.co.uk",
headers = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
var2 = var1.content
soup=BeautifulSoup(var2, "html.parser")
span = soup.find("span", {"class":"label"})
price = span.text
price

输出:“有库存”

此“现货”位于HTML代码的前几行。

<div>
<span class="label green">In Stock</span>

有人可以指出正确的跨度吗?

3 个答案:

答案 0 :(得分:0)

您选择了span = soup.find("span", {"class":"label"}),这是带有类标签的第一个跨度,然后您就知道了。您可以通过span = soup.find_all("span", {"class":"label"}, limit=3)[2]

获得期望值

答案 1 :(得分:0)

您可以使用CSS选择器nth-child()

from bs4 import BeautifulSoup

txt = """THE ABOVE HTML"""
soup = BeautifulSoup(txt, "html.parser")

print(soup.select_one("aside:nth-child(3) > div > span").text)

输出:

£54.50

答案 2 :(得分:0)

另一种方法。

from simplified_scrapy.spider import SimplifiedDoc
html = '''your html
'''
doc = SimplifiedDoc(html)  # create doc
span = doc.getElement('span', start="Price:")
print (span.text)

结果:

£65.40