我用python编写了一个脚本来获取网站上产品的价格,但是我发现了一个问题。有时某些产品在销售中,因为它们有2个价格(原始价格和实际价格)。我的脚本得到了所有这些脚本,但是我不希望在出售前获得价格。如何排除它们?有可能吗?
源代码示例:
正常价格
<div class="result-actions"
<span> ==$0
$ 1,98
</span>
特价
<div class="result-actions">
<span>
<small class="price-before"> ==$0
$ 56,70
</small>
<span class="price-now">
$ 39,60
</span>
</span>
我的脚本
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
site = input()
html = urlopen(site)
bs = BeautifulSoup(html, 'html.parser')
pricesList = bs.findAll('div',{'class':'result-actions'})
csvFile = open('Prices.csv', 'wt+')
writer = csv.writer(csvFile)
try:
for prices in pricesList:
print(clean_up_text(prices.get_text()))
csvPrice = []
csvPrice.append(clean_up_text(prices.get_text().strip()))
writer.writerow(csvPrice)
finally:
csvFile.close()
请帮帮我!
我试图包括一个排除旧价格的功能,但似乎也行不通。
def excluir_precos_antigos(element):
element = driver.find_element_by_class_name('price-before')
driver.execute_script("""var element =
arguments[0];element.parentNode.removeChild(element);""", element)
答案 0 :(得分:0)
您只需找到存储销售价格的span
标签:
d = """
<div class="result-actions">
<span>
<small class="price-before"> ==$0
$ 56,70
</small>
<span class="price-now">
$ 39,60
</span>
</span>
</div>
"""
from bs4 import BeautifulSoup as soup
result = soup(d, 'html.parser').find('span', {'class':'price-now'}).text
输出:
'\n $ 39,60\n '
如果页面上有多个result-actions
div
,则可以使用find_all
:
final_results = [i.find('span', {'class':'price-now'}).text for i in soup(d, 'html.parser').find_all('div', {'class':'result-actions'})]
在页面上给出所有销售价格:
['\n $ 39,60\n ']