我有一个具有登录网页的供应商,我试图在其中获取价格和空房。在VBA中,选择器在Python中运行时,我什么也没有。
这是我从中获得价格的HTML部分:
<div class="product-info-price">
<div class="price-box price-final_price" data-role="priceBox" data-product-
id="32686" data-price-box="product-id-32686">
<span class="special-price">
<span class="price-container price-final_price tax weee" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<span class="price-label">Ειδική Τιμή</span>
<span id="product-price-32686" data-price-amount="7.9" data-price-type="finalPrice" class="price-wrapper " >
<span class="price">7,90 €</span>
</span>
<meta itemprop="price" content="7.9" />
<meta itemprop="priceCurrency" content="EUR" />
</span>
</span>
</div>
</div>
在VBA中,我使用以下选择器:
.price-box .price-final_price .price
在Python中,我使用:
price = soup.find('span', attrs={'class':'price'})
if price is not None:
price_text = price.text.strip()
print(price_text)
else:
price_text = "0,00"
print(price_text)
我总是得到0,00
作为价格。
在soup.find
中我应该更改什么?
答案 0 :(得分:3)
Css选择器通常比xpath更快。您可以使用以下内容:
from bs4 import BeautifulSoup as bs
html = '''
<div class="product-info-price">
<div class="price-box price-final_price" data-role="priceBox" data-product-
id="32686" data-price-box="product-id-32686">
<span class="special-price">
<span class="price-container price-final_price tax weee" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<span class="price-label">Ειδική Τιμή</span>
<span id="product-price-32686" data-price-amount="7.9" data-price-type="finalPrice" class="price-wrapper " >
<span class="price">7,90 €</span>
</span>
<meta itemprop="price" content="7.9" />
<meta itemprop="priceCurrency" content="EUR" />
</span>
</span>
</div>
</div>
'''
soup = bs(html, 'lxml')
prices = [price.text for price in soup.select('.price')]
print(prices)
或者:
altPrices = [price['content'] for price in soup.select("[itemprop=price]")]
print(altPrices)
答案 1 :(得分:0)
我更喜欢lxml,对于我来说显然可以使用xPath而不是CSS选择器:
from lxml import html
all_html = html.fromstring(the_entire_html)
price = all_html.xpath('//meta[@itemprop="price"]/@content')
# or
price = all_html.xpath('//div[@class="product-info-price"]//span[@class="price"]/text()')