我是Python的新手,并尝试通过做一些小的项目来学习。我目前正在尝试从各种网页上收集一些信息,但是,每当将抓取的数据输出到CSV时,它似乎只能从最后一个URL输出数据。
理想情况下,我希望它能够写入CSV而不是附加到CSV,因为我只想要一个CSV,其中仅包含最近刮取的最新数据。
我在StackOverflow上浏览了与此类似的其他查询,但我要么不理解它们,要么它们对我不起作用。 (可能是前者)。
任何帮助将不胜感激。
import csv
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = ['URL1','URL2']
for URL in URL:
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
nameElement = soup.find('p', attrs={'class':'name'}).a
nameText = nameElement.text.strip()
priceElement = soup.find('span', attrs={'class':'price'})
priceText = priceElement.text.strip()
columns = [['Name','Price'], [nameText, priceText]]
with open('index.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerows(columns)
答案 0 :(得分:1)
您必须在for
循环之前打开文件,并在for
循环中写入每一行
URL = ['URL1','URL2']
with open('index.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow( ['Name','Price'] )
for URL in URL:
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
nameElement = soup.find('p', attrs={'class':'name'}).a
nameText = nameElement.text.strip()
priceElement = soup.find('span', attrs={'class':'price'})
priceText = priceElement.text.strip()
writer.writerow( [nameText, priceText] )
或者您必须在for
循环之前创建列表并将append()
数据添加到该列表中
URL = ['URL1','URL2']
columns = [ ['Name','Price'] ]
for URL in URL:
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
nameElement = soup.find('p', attrs={'class':'name'}).a
nameText = nameElement.text.strip()
priceElement = soup.find('span', attrs={'class':'price'})
priceText = priceElement.text.strip()
columns.append( [nameText, priceText] )
with open('index.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerows(columns)