我有以下代码,它们抓取一个网站,并将结果写到一个csv文件中。问题在于,出于某种原因,for循环会打印每个迭代的多个副本,而该副本仅应将每个迭代写入一次。有人可以帮忙,指出我在这里想念的是什么吗? 谢谢
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://online.computicket.com'
home_page = requests.get(url)
home_page.content
soup = BeautifulSoup(home_page.content, 'lxml')
links = soup.find_all('a', {'class':'info'})
next_link = []
for link in links:
next_link.append(link.get("href"))
for i in range(0, len(next_link),1):
next_link.append(i)
print(url + next_link[i])
new_url = requests.get(url + next_link[i])
for link in (url + next_link[i]):
new_url.content
soup = BeautifulSoup(new_url.content, 'lxml')
info_name = soup.find('div', {'class' : 'es-cost'})
heading = soup.find('h1',{'class' : 'full'})
with open('Don.csv', 'a') as csv_file:
#csv_file.write(heading.get_text())
for name in soup.find_all('div', {'class' : 'es-cost'}):
csv_file.write(heading.get_text())
csv_file.write(name.get_text())
print(name.get_text())
答案 0 :(得分:0)
由于嵌套的for循环,我认为您的程序可以打印多个副本。但是,它的link
变量不在循环内的任何地方使用。尝试删除嵌套的for语句,替换此部分代码:
for i in range(0, len(next_link),1):
next_link.append(i)
print(url + next_link[i])
new_url = requests.get(url + next_link[i])
for link in (url + next_link[i]):
new_url.content
soup = BeautifulSoup(new_url.content, 'lxml')
info_name = soup.find('div', {'class' : 'es-cost'})
heading = soup.find('h1',{'class' : 'full'})
with open('Don.csv', 'a') as csv_file:
#csv_file.write(heading.get_text())
for name in soup.find_all('div', {'class' : 'es-cost'}):
csv_file.write(heading.get_text())
csv_file.write(name.get_text())
print(name.get_text())
与此
for i in range(0, len(next_link),1):
next_link.append(i)
print(url + next_link[i])
new_url = requests.get(url + next_link[i])
new_url.content
soup = BeautifulSoup(new_url.content, 'lxml')
info_name = soup.find('div', {'class' : 'es-cost'})
heading = soup.find('h1',{'class' : 'full'})
with open('Don.csv', 'a') as csv_file:
#csv_file.write(heading.get_text())
for name in soup.find_all('div', {'class' : 'es-cost'}):
csv_file.write(heading.get_text())
csv_file.write(name.get_text())
print(name.get_text())