Question

我是Python的新手，并尝试通过做一些小的项目来学习。我目前正在尝试从各种网页上收集一些信息，但是，每当将抓取的数据输出到CSV时，它似乎只能从最后一个URL输出数据。

理想情况下，我希望它能够写入CSV而不是附加到CSV，因为我只想要一个CSV，其中仅包含最近刮取的最新数据。

我在StackOverflow上浏览了与此类似的其他查询，但我要么不理解它们，要么它们对我不起作用。（可能是前者）。

任何帮助将不胜感激。

import csv
import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = ['URL1','URL2']

for URL in URL:
    response = requests.get(URL)
    soup = BeautifulSoup(response.content, 'html.parser')

    nameElement = soup.find('p', attrs={'class':'name'}).a
    nameText = nameElement.text.strip()

    priceElement = soup.find('span', attrs={'class':'price'})
    priceText = priceElement.text.strip()



columns = [['Name','Price'], [nameText, priceText]]


with open('index.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerows(columns)

Answer 1

您必须在for循环之前打开文件，并在for循环中写入每一行

URL = ['URL1','URL2']

with open('index.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)

    writer.writerow( ['Name','Price'] )

    for URL in URL:
        response = requests.get(URL)
        soup = BeautifulSoup(response.content, 'html.parser')

        nameElement = soup.find('p', attrs={'class':'name'}).a
        nameText = nameElement.text.strip()

        priceElement = soup.find('span', attrs={'class':'price'})
        priceText = priceElement.text.strip()

        writer.writerow( [nameText, priceText] )

或者您必须在for循环之前创建列表并将append()数据添加到该列表中

URL = ['URL1','URL2']

columns = [ ['Name','Price'] ]

for URL in URL:
    response = requests.get(URL)
    soup = BeautifulSoup(response.content, 'html.parser')

    nameElement = soup.find('p', attrs={'class':'name'}).a
    nameText = nameElement.text.strip()

    priceElement = soup.find('span', attrs={'class':'price'})
    priceText = priceElement.text.strip()

    columns.append( [nameText, priceText] )

with open('index.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerows(columns)

抓取工具仅将数据从最后一个URL输出到CSV

1 个答案: