我最近开始用 Python 编写小程序。我使用 BS4 制作了一个 Webscraper,但我无法将此抓取工具用于多个网站。
Webscraper 用于检查各种荷兰零售商的 RTX 3070 视频卡的可用性。但是,我只能为一个网站进行这项工作。我知道与网站 B 相比,网站 A 使用不同的类来提高可用性。因此,我想我需要创建一个 IF... Else 语句与 For...Loop 结合使用
注意:此刮刀并非旨在“剥去”产品。
以下内容就像一个魅力......对于单个网站:
from bs4 import BeautifulSoup
from csv import writer
import smtplib
response = requests.get('https://www.alternate.nl/Grafische-kaarten/RTX-3070')
soup = BeautifulSoup(response.text, 'html.parser')
posts = soup.find_all(class_='listRow')
with open('RTX_avail.csv', 'w') as csv_file:
csv_writer = writer(csv_file)
headers = ['Title', 'Brand', 'Price', 'Availability', 'URL']
csv_writer.writerow(headers)
for post in posts:
title = post.find(class_='name').get_text()
link = post.find('a')['href']
improved_link = 'alternate.nl'+link
price = post.find(class_='price right right10').get_text().replace("€ ","").replace(',-*','')
inStock = post.find(class_='stockStatusContainer complete').get_text()
availability = 'Not Available' if "Pre-order" in inStock else 'Available'
brand = title.split(' ', 1)[0]
print([title, brand, price, availability, improved_link])
csv_writer.writerow([title, brand, price, availability, improved_link])
使用以下方法不起作用:
urls = ['https://www.alternate.nl/Grafische-kaarten/RTX-3070', 'https://www.coolblue.nl/videokaarten/nvidia-chipset/nvidia-rtx-3070']
response = requests.get(urls)
我也尝试重新创建刮板,但无济于事:
from bs4 import BeautifulSoup
import requests
url = ['https://www.alternate.nl/Grafische-kaarten/RTX-3070', 'https://www.coolblue.nl/videokaarten/nvidia-chipset/nvidia-geforce-rtx-3000-serie/nvidia-geforce-rtx-3070']
data = []
def letCrawl (url, data):
for pg in url:
if url == 'https://www.alternate.nl/Grafische-kaarten/RTX-3070':
page = requests.get(pg)
soup = BeautifulSoup(page.content, "html.parser")
ticker = soup.find("h1").text
data.append(ticker)
break
else:
page = requests.get(pg)
soup = BeautifulSoup(page.content, "html.parser")
soup.find("h1").text
data.append(ticker)
print(data)
有谁能帮帮我吗?我只需要能够让它在多个网站上运行。我想我可以自己弄清楚 if...else 语句,但请随时给我一些建议。