Question

我有以下代码，它们抓取一个网站，并将结果写到一个csv文件中。问题在于，出于某种原因，for循环会打印每个迭代的多个副本，而该副本仅应将每个迭代写入一次。有人可以帮忙，指出我在这里想念的是什么吗？谢谢

import requests
from bs4 import BeautifulSoup
import csv

url = 'https://online.computicket.com'
home_page = requests.get(url)

home_page.content

soup = BeautifulSoup(home_page.content, 'lxml')


links = soup.find_all('a', {'class':'info'})

next_link = []

for link in links:
    next_link.append(link.get("href"))


for i in range(0, len(next_link),1):    
    next_link.append(i)
    print(url + next_link[i])
    new_url = requests.get(url + next_link[i])   

    for link in (url + next_link[i]):
        new_url.content
        soup = BeautifulSoup(new_url.content, 'lxml')

        info_name = soup.find('div', {'class' : 'es-cost'}) 
        heading = soup.find('h1',{'class' : 'full'})

        with open('Don.csv', 'a') as csv_file:

            #csv_file.write(heading.get_text())
            for name in soup.find_all('div', {'class' : 'es-cost'}):
                csv_file.write(heading.get_text())
                csv_file.write(name.get_text())

                print(name.get_text())

Answer 1

由于嵌套的for循环，我认为您的程序可以打印多个副本。但是，它的link变量不在循环内的任何地方使用。尝试删除嵌套的for语句，替换此部分代码：

for i in range(0, len(next_link),1):    
next_link.append(i)
print(url + next_link[i])
new_url = requests.get(url + next_link[i])   

for link in (url + next_link[i]):
    new_url.content
    soup = BeautifulSoup(new_url.content, 'lxml')

    info_name = soup.find('div', {'class' : 'es-cost'}) 
    heading = soup.find('h1',{'class' : 'full'})

    with open('Don.csv', 'a') as csv_file:

        #csv_file.write(heading.get_text())
        for name in soup.find_all('div', {'class' : 'es-cost'}):
            csv_file.write(heading.get_text())
            csv_file.write(name.get_text())

            print(name.get_text())

与此

for i in range(0, len(next_link),1):    
next_link.append(i)
print(url + next_link[i])
new_url = requests.get(url + next_link[i])   

new_url.content
soup = BeautifulSoup(new_url.content, 'lxml')

info_name = soup.find('div', {'class' : 'es-cost'}) 
heading = soup.find('h1',{'class' : 'full'})

with open('Don.csv', 'a') as csv_file:

    #csv_file.write(heading.get_text())
    for name in soup.find_all('div', {'class' : 'es-cost'}):
        csv_file.write(heading.get_text())
        csv_file.write(name.get_text())

        print(name.get_text())

For循环迭代没有预期的影响

1 个答案: