我目前正在尝试对两种不同价格之间节省的百分比进行网上剪贴。我要进行网页剪贴的第一个元素的HTML代码是:
<li class="price-save">
<span class="price-save-endtime price-save-endtime-current"></span>
<span class="price-save-endtime price-save-endtime-another" style="display:none;"></span>
<span class="price-save-label">Save: </span>
<span class="price-save-dollar"></span>
<span class="price-save-percent">22%</span> <----------------------I WANT THIS ONE!
</li>
为此,我用Python编写了以下代码:
try:
percentage = soup.find('span',class_='price-save-percent').get_text()
except:
print("Not found")
但是,如果网站的下一个元素不包含%,则将结果打印到.csv文件中时,它将复制结果,直到找到具有百分比的下一个元素。为了更好的理解,请访问以下网址:https://www.newegg.com/Laptops-Notebooks/SubCategory/ID-32?Tid=6740
您可以看到第一个元素具有% Save
,第二个元素也具有,但第三个元素没有。在.csv
文件中,第三个元素获取第二个元素的保存百分比。这种情况反复发生。相反,我只想有一个空白单元格。
答案 0 :(得分:1)
您需要处理列表中每个项目的NA条件。为此,您只需要在网格中包含相关的div项即可。下面的代码完成了这项工作,并将所有price_saved保存在列表中(如果可用的话),否则追加NA-:
import bs4
from urllib.request import urlopen as req
from bs4 import BeautifulSoup as soup
import csv
#Link de la pàgina on farem webscraping
url = 'https://www.newegg.com/Laptops-Notebooks/SubCategory/ID-32?Tid=6740'
#Obrim una connexió amb la pàgina web
Client = req(url)
#Offloads the content of the page into a variable
pagina = Client.read()
#Closes the client
Client.close()
#html parser
pagina_soup=soup(pagina,"html.parser")
#grabs each product
productes = pagina_soup.findAll("div",{"class":"item-container"})
#Obrim un axiu .csv
#Capçaleres del meu arxiu .csv
result_file = open("ordinadors.csv",'a',encoding='utf-8',newline='')
#Escrivim la capçalera
head = ["Marca","Producte","PreuActual","PreuAnterior","CostEnvio","Rebaixa"]
writing_csv = csv.DictWriter(result_file, fieldnames=head)
writing_csv.writeheader()
#Fem un loop sobre tots els productes
for producte in productes:
#Agafem la marca del producte
marca_productes = producte.findAll("div",{"class":"item-info"})
marca = marca_productes[0].div.a.img["title"]
#Agafem el nom del producte
name = producte.a.img["title"]
#Preu Actual
actual_productes = producte.findAll("li",{"class":"price-current"})
preuActual = actual_productes[0].strong.text
#Preu anterior
try:
#preuAbans = producte.find("li", class_="price-was").next_element.strip()
preuAbans = producte.find('span',class_='price-was-data').get_text()
percentage = producte.find('span',class_='price-save-percent').get_text()
except:
preuAbans = "NA"
percentage = "NA"
#Agafem els costes de envio
costos_productes = producte.findAll("li",{"class":"price-ship"})
#Com que es tracta d'un vector, agafem el primer element i el netegem.
costos = costos_productes[0].text.strip()
#Writing the file
writing_csv.writerow({"Marca": marca, "Producte": name, "PreuActual": preuActual, "PreuAnterior": preuAbans,"CostEnvio":costos,"Rebaixa":percentage})
result_file.close()