我试图从某个网站上获取价格,以下是HTML代码的一小部分:
</div>
</div>
<div class="right custom">
<div class="description custom">
<aside>
<h4>Availability:</h4>
<div>
<span class="label green">In Stock</span>
</div>
</aside>
<aside>
<h4>Price:</h4>
<div>
<span class="label">£65.40</span>
</div>
</aside>
<aside>
<h4>Ex Tax:</h4>
<div>
<span class="label">£54.50</span>
</div>
</aside>
<div class="price">
£65.40 </div>
<section class="custom-order">
<div class="options">
<div class="option" id="option-276">
<span class="required">*</span>
<label>Type & Extras:</label><br/>
<select name="option[276]">
<option value=""> --- Please Select --- </option>
<option value="146">Each </option>
</select>
</div>
</div>
<div class="quantity custom">
<label>Quantity:</label><br/>
<input name="quantity" size="2" type="text" value="1"/>
</div>
</section>
<!-- -->
<div class="cart">
<div>
我试图选择54.50英镑的价格(不含英国税的价格)。
我使用的代码如下:
import requests
from bs4 import BeautifulSoup
import pandas as pd
var1 = requests.get("https://www.website.co.uk",
headers = {'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
var2 = var1.content
soup=BeautifulSoup(var2, "html.parser")
span = soup.find("span", {"class":"label"})
price = span.text
price
输出:“有库存”
此“现货”位于HTML代码的前几行。
<div>
<span class="label green">In Stock</span>
有人可以指出正确的跨度吗?
答案 0 :(得分:0)
您选择了span = soup.find("span", {"class":"label"})
,这是带有类标签的第一个跨度,然后您就知道了。您可以通过span = soup.find_all("span", {"class":"label"}, limit=3)[2]
答案 1 :(得分:0)
您可以使用CSS选择器nth-child()
:
from bs4 import BeautifulSoup
txt = """THE ABOVE HTML"""
soup = BeautifulSoup(txt, "html.parser")
print(soup.select_one("aside:nth-child(3) > div > span").text)
输出:
£54.50
答案 2 :(得分:0)
另一种方法。
from simplified_scrapy.spider import SimplifiedDoc
html = '''your html
'''
doc = SimplifiedDoc(html) # create doc
span = doc.getElement('span', start="Price:")
print (span.text)
结果:
£65.40