Web Scraping循环结构问题

时间:2018-06-12 13:05:31

标签: python web-scraping beautifulsoup python-requests

我目前正在将一些代码写入AutoTrader的网络搜索作为练习项目。我无法打印出我需要的结果。

所需的输出应为:

Car 1
Specs Car 1

相反,它是

Car 1
Specs Car 1
Specs Car 2
Specs Car X

car 2

在我的循环结构中哪里出错?

from bs4 import BeautifulSoup 
import requests

page_link = ("https://www.autotrader.co.uk/car-search?sort=price-asc&radius=1500&postcode=lu15jq&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&make=AUDI&model=A5&price-to=8500&year-from=2008&maximum-mileage=90000&transmission=Automatic&exclude-writeoff-categories=on")
LN = 0
r = requests.get(page_link)
c = r.content
soup = BeautifulSoup(c,"html.parser")

all = soup.find_all("h2",{"class":"listing-title title-wrap"})
all2 = soup.find_all('ul',{"class" :'listing-key-specs '})

上面的代码块很好。下面的块打印输出。

LN = -1
ListTotal = len(all)
for item in all:
    if LN <= ListTotal:
        LN += 1
        print(item.find("a", {"class": "js-click-handler listing-fpa-link"}).text)
        for carspecs in all2:
            print (carspecs.text)
    else:
        break

由于

1 个答案:

答案 0 :(得分:2)

因为您每次都会carspec in all2每次打印

all = ...
all2 = ...

for item in all:
    ...
    for carspecs in all2:  
            # will print everything in all2 on each iteration of all
            print (carspecs.text)

我怀疑你想要

for item, specs in zip(all, all2):
    ...
    print(specs.text)

仅供参考我用更好的逻辑和名字清理你的代码,摆脱多余的东西并使其服从python style guide

import requests
from bs4 import BeautifulSoup

page_link = ("https://www.autotrader.co.uk/car-search?sort=price-asc&"
             "radius=1500&postcode=lu15jq&onesearchad=Used&"
             "onesearchad=Nearly%20New&onesearchad=New&make=AUDI&model=A5"
             "&price-to=8500&year-from=2008&maximum-mileage=90000"
             "&transmission=Automatic&exclude-writeoff-categories=on")

request = requests.get(page_link)
conn = request.content
soup = BeautifulSoup(conn, "html.parser")

# don't overload the inbuilt `all`
cars = soup.find_all("h2", {"class":"listing-title title-wrap"})
cars_specs = soup.find_all('ul', {"class" :'listing-key-specs '})

for car, specs in zip(cars, cars_specs):
    # your logic with regards to the `LN` variable did absolutely nothing
    print(car.find("a", {"class": "js-click-handler listing-fpa-link"}))
    print(specs.text)