如何在Python中排除孩子

时间:2018-11-06 16:19:34

标签: python csv beautifulsoup

我用python编写了一个脚本来获取网站上产品的价格,但是我发现了一个问题。有时某些产品在销售中,因为它们有2个价格(原始价格和实际价格)。我的脚本得到了所有这些脚本,但是我不希望在出售前获得价格。如何排除它们?有可能吗?

源代码示例:

正常价格

 <div class="result-actions"
   <span> ==$0
     $ 1,98
   </span>

特价

 <div class="result-actions">
   <span>
     <small class="price-before"> ==$0
       $ 56,70
     </small>
     <span class="price-now">
       $ 39,60
     </span>
   </span>

我的脚本

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

site = input()
html = urlopen(site)
bs = BeautifulSoup(html, 'html.parser')
pricesList = bs.findAll('div',{'class':'result-actions'})
csvFile = open('Prices.csv', 'wt+')
writer = csv.writer(csvFile)


try:
  for prices in pricesList:
    print(clean_up_text(prices.get_text()))
    csvPrice = []
    csvPrice.append(clean_up_text(prices.get_text().strip()))
    writer.writerow(csvPrice)                  
finally:
  csvFile.close()

请帮帮我!

更新

我试图包括一个排除旧价格的功能,但似乎也行不通。

def excluir_precos_antigos(element):
  element = driver.find_element_by_class_name('price-before')
    driver.execute_script("""var element = 
    arguments[0];element.parentNode.removeChild(element);""", element)

1 个答案:

答案 0 :(得分:0)

您只需找到存储销售价格的span标签:

d = """
<div class="result-actions">
 <span>
  <small class="price-before"> ==$0
   $ 56,70
  </small>
  <span class="price-now">
   $ 39,60
  </span>
</span>
</div>
"""

from bs4 import BeautifulSoup as soup
result = soup(d, 'html.parser').find('span', {'class':'price-now'}).text

输出:

'\n   $ 39,60\n  '

如果页面上有多个result-actions div,则可以使用find_all

final_results = [i.find('span', {'class':'price-now'}).text for i in soup(d, 'html.parser').find_all('div', {'class':'result-actions'})]

在页面上给出所有销售价格:

['\n   $ 39,60\n  ']