我正在尝试使用Python进行网络抓取,结果输出到一个csv文件中,但是,当我运行脚本时,我得到了同一产品名称的多个条目。这是我的代码-
import bs4
from urllib.request
import urlopen as uReq
from bs4
import BeautifulSoup as soup
my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")
# grabs each product
containers = page_soup.findAll("div", {
"class": "item-container"
})
filename = "products.csv"
f = open(filename, "w")
headers = "product_name, shipping\n"
f.write(headers)
for container in containers:
container = page_soup.findAll("div", {
"class": "item-info"
})
print(container[0].div.a.img["title"])
container = page_soup.findAll("a", {
"class": "item-title"
})
product_name = container[0].text
container = page_soup.findAll("li", {
"class": "price-ship"
})
shipping = container[0].text.strip()
print("product_name: " + product_name)
print("shipping: " + shipping)
f.write(product_name.replace(",", "|") + "," + shipping + "\n")
f.close()
答案 0 :(得分:0)
要获取有关同一项目的各种信息,可以使用zip()
函数。对于编写CSV文件,我建议使用csv
模块(doc)-它会自动处理引号和定界符:
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
with open('out.csv', 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
csvwriter.writerow(["product_name", "shipping"])
for product_name, shipping in zip(soup.select('.item-container .item-title'), soup.select('.item-container .price-ship')):
csvwriter.writerow([product_name.get_text(strip=True), shipping.get_text(strip=True)])
out.csv
的输出将是:
product_name,shipping
"EVGA GeForce RTX 2080 Ti XC ULTRA GAMING, 11G-P4-2383-KR, 11GB GDDR6, Dual HDB Fans & RGB LED",Free Shipping
XFX Radeon RX 5700 XT DirectX 12 RX-57XT8MFD6 Video Card,Free Shipping
GIGABYTE GeForce RTX 2060 DirectX 12 GV-N2060GAMINGOC PRO WHITE-6GD Video Card,Free Shipping
"Aorus AD27QD 27"" 144Hz 1440P FreeSync Gaming Monitor + GIGABYTE Radeon RX ...",
ASUS ROG Strix GeForce RTX 2070 DirectX 12 ROG-STRIX-RTX2070-8G-GAMING Video Card,Free Shipping
"EVGA GeForce RTX 2060 SC Ultra GAMING, 06G-P4-2067-KR, 6GB GDDR6, Dual HDB Fans",Free Shipping
PowerColor AMD Radeon RX 5700 XT 8GB GDDR6 AXRX 5700XT 8GBD6-M3DH,Free Shipping
MSI GeForce RTX 2080 DirectX 12 RTX 2080 VENTUS 8G Video Card,Free Shipping
ZOTAC GeForce GTX 1060 DirectX 12 ZT-P10620A-10M Video Card,Free Shipping
ASRock Phantom Gaming X Radeon VII DirectX 12 Radeon VII 16G Video Card,$6.99 Shipping
"Sapphire PULSE Radeon RX 580 8GB GDDR5 PCI-E Dual HDMI / DVI-D / Dual DP OC w/ Backplate (UEFI), 100411P8GOCL",Free Shipping
XFX Radeon RX 590 Fatboy DirectX 12 RX-590P8DFD6 8GB 256-Bit DDR5 PCI Express 3.0 CrossFireX Support Video Card,Free Shipping
在LibreOffice中打开此文件: