我正在尝试抓取并将结果输出到csv文件中

时间:2019-07-12 00:08:34

标签: python beautifulsoup python-requests

我正在尝试使用Python进行网络抓取,结果输出到一个csv文件中,但是,当我运行脚本时,我得到了同一产品名称的多个条目。这是我的代码-

import bs4
from urllib.request
import urlopen as uReq
from bs4
import BeautifulSoup as soup

my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each product
containers = page_soup.findAll("div", {
    "class": "item-container"
})

filename = "products.csv"
f = open(filename, "w")

headers = "product_name, shipping\n"

f.write(headers)


for container in containers:
    container = page_soup.findAll("div", {
        "class": "item-info"
    })
print(container[0].div.a.img["title"])

container = page_soup.findAll("a", {
    "class": "item-title"
})
product_name = container[0].text

container = page_soup.findAll("li", {
    "class": "price-ship"
})
shipping = container[0].text.strip()


print("product_name: " + product_name)
print("shipping: " + shipping)


f.write(product_name.replace(",", "|") + "," + shipping + "\n")

f.close()

1 个答案:

答案 0 :(得分:0)

要获取有关同一项目的各种信息,可以使用zip()函数。对于编写CSV文件,我建议使用csv模块(doc)-它会自动处理引号和定界符:

from bs4 import BeautifulSoup
import requests
import csv

url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
soup = BeautifulSoup(requests.get(url).text, 'lxml')

with open('out.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=',',
                            quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csvwriter.writerow(["product_name", "shipping"])
    for product_name, shipping in zip(soup.select('.item-container .item-title'), soup.select('.item-container .price-ship')):
        csvwriter.writerow([product_name.get_text(strip=True), shipping.get_text(strip=True)])

out.csv的输出将是:

product_name,shipping
"EVGA GeForce RTX 2080 Ti XC ULTRA GAMING, 11G-P4-2383-KR, 11GB GDDR6, Dual HDB Fans & RGB LED",Free Shipping
XFX Radeon RX 5700 XT DirectX 12 RX-57XT8MFD6 Video Card,Free Shipping
GIGABYTE GeForce RTX 2060 DirectX 12 GV-N2060GAMINGOC PRO WHITE-6GD Video Card,Free Shipping
"Aorus AD27QD 27"" 144Hz 1440P FreeSync Gaming Monitor + GIGABYTE Radeon RX ...",
ASUS ROG Strix GeForce RTX 2070 DirectX 12 ROG-STRIX-RTX2070-8G-GAMING Video Card,Free Shipping
"EVGA GeForce RTX 2060 SC Ultra GAMING, 06G-P4-2067-KR, 6GB GDDR6, Dual HDB Fans",Free Shipping
PowerColor AMD Radeon RX 5700 XT 8GB GDDR6 AXRX 5700XT 8GBD6-M3DH,Free Shipping
MSI GeForce RTX 2080 DirectX 12 RTX 2080 VENTUS 8G Video Card,Free Shipping
ZOTAC GeForce GTX 1060 DirectX 12 ZT-P10620A-10M Video Card,Free Shipping
ASRock Phantom Gaming X Radeon VII DirectX 12 Radeon VII 16G Video Card,$6.99 Shipping
"Sapphire PULSE Radeon RX 580 8GB GDDR5 PCI-E Dual HDMI / DVI-D / Dual DP OC w/ Backplate (UEFI), 100411P8GOCL",Free Shipping
XFX Radeon RX 590 Fatboy DirectX 12 RX-590P8DFD6 8GB 256-Bit DDR5 PCI Express 3.0 CrossFireX Support Video Card,Free Shipping

在LibreOffice中打开此文件:

enter image description here