Python BeautifulSoup网络抓取

时间:2020-11-03 21:30:39

标签: python csv web-scraping beautifulsoup

我希望有人能帮助我解决下一个问题

我想在一行中获取数据,这就是我现在在csv中获取的数据:

9200000083649863,bol.com retourdeals
9200000083649863,"41,75"
9200000083649863,ITidee
9200000083649863,"45,88"
9200000083649863,Bol.com
9200000083649863,"47,99"

我想要的是:

9200000083649863,bol.com retourdeals ,41,75
9200000083649863,ITidee, 45,88
9200000083649863,Bol.com 47,99

这是代码

def haalprijs_verkoper(ean, Urll):
    URL = Urll
    ean = ean
    page = requests.get(URL)

    csvfile = open('/home/filoor1/webscrap/book1.csv', 'a')
    csvwriter = csv.writer(csvfile)
    soup = ""
    results = ""
    soup = BeautifulSoup(page.text, 'html.parser')
    results = soup.find(id='offers')
    naam = results.find_all("p, strong")
    prijs = results.find_all("span")
    # print(results.prettify())
    counter = 0
    for tag in results.find_all([  'strong' , 'span']):
        # print(tag.text)
        aa = tag.text
        aa = aa.replace("Nieuw", "")
        aa = aa.replace("   ", "")
        aa = aa.replace("\n","")
        aa = aa.replace("''", "aaaaaa")
        aa = aa.strip(' "')
        aa = aa.strip('"')
        if aa != "":
            counter += 0.5
            # print(ean, aa, counter)
            csvwriter.writerow([ean, aa])   
    
haalprijs_verkoper(9200000083649863, 'https://www.bol.com/nl/prijsoverzicht/tp-link-tl-sg1005p-switch/9200000083649863/?filter=all&sort=price&sortOrder=asc')

谢谢

1 个答案:

答案 0 :(得分:0)

您可以使用此示例抓取数据并保存正确的CSV:

import csv
import requests
from bs4 import BeautifulSoup


url = 'https://www.bol.com/nl/prijsoverzicht/tp-link-tl-sg1005p-switch/9200000083649863/?filter=all&sort=price&sortOrder=asc'
soup = BeautifulSoup( requests.get(url).content, 'html.parser' )

ean = '9200000083649863'

all_data = []
for s, p in zip(soup.select('p.nosp > strong'),
                soup.select('span.product-prices__currency.product-prices__bol-price')):
    all_data.append([ean, s.get_text(strip=True), p.get_text(strip=True)])

with open('data.csv', 'w') as f_out:
    writer = csv.writer(f_out)
    writer.writerows(all_data)

保存此data.csv

9200000083649863,bol.com retourdeals,"41,75"
9200000083649863,ITidee,"45,88"
9200000083649863,Bol.com,"47,99"
9200000083649863,4Allshop,"49,70"
9200000083649863,codima,"51,69"
9200000083649863,PlazaSale.nl,"53,40"
9200000083649863,Stock Sellers B.V.,"53,67"
9200000083649863,Art & Craft,"54,27"
9200000083649863,ORM Wholesale,"54,38"
9200000083649863,DutchDo B.V.,"55,92"