空列表Python美丽汤

时间:2020-07-03 12:40:06

标签: python python-3.x pandas beautifulsoup

我是网络爬网的新手。我正在尝试提取有关汽车列表的信息。但是,当我运行以下代码时,我只会得到空列表。

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

from time import sleep
from random import randint

title=[]
kilometres=[]
transmission=[]
engine=[]
price=[]
adtype=[]

url='https://www.carsales.com.au/cars/new-south-wales-state/sydney-metro-region/suv-bodystyle/?offset=0'
headers = {"Accept-Language": "en-AU, en;q=0.5"}
page=requests.get(url,headers=headers)
soup=BeautifulSoup(page.text,'html.parser')

names=soup.find_all(class_='col')
for item in names:
    title.append(item.find('a').txt)

distances=soup.find_all('li',{'data-type':'Odometer'})
for item in distances:
    kilometres.append(item.text)

trans=soup.find_all('li',{'data-type':'Transmission'})
for item in trans:
    transmission.append(item.text)

engines=soup.find_all('li',{'data-type':'Engine'})
for item in engines:
    engine.append(item.text)

prices=soup.find_all(class_='price')
for item in prices:
    price.append(item.find('a').text)

adtypes=soup.find_all(class_='seller-type')
for item in adtypes:
    adtype.append(item.text)

我在这里做错了什么?我想将URL中的数据抓取到Pandas Dataframe中。

1 个答案:

答案 0 :(得分:0)

要获取正确的页面,请将User-Agent标头和Accept-Language设置为"en-US,en;q=0.5"

import requests
import pandas as pd
from bs4 import BeautifulSoup

url='https://www.carsales.com.au/cars/new-south-wales-state/sydney-metro-region/suv-bodystyle/?offset=0'
headers = {"Accept-Language": "en-US,en;q=0.5", 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
page=requests.get(url,headers=headers)
soup=BeautifulSoup(page.text,'html.parser')

all_data = []

for car in soup.select('.listing-item'):
    title = car.select_one('h3 > a').text
    price = car.select_one('.price > a').text
    type_ = car.select_one('.seller-type, .franchise-stock-type').get_text(strip=True)
    all_data.append( dict(title=title, price=price, type=type_, **{li['data-type']: li.text for li in car.select('li[data-type]')}) )

df = pd.DataFrame(all_data)
print(df)

df.to_csv('data.csv')

打印:

                                                title       price                type    Odometer Body Style Transmission                  Engine           Build Date
0   2019 Nissan Pathfinder ST-L R52 Series III Aut...   $45,878*      Dealer Used Car    1,400 km        SUV    Automatic        6cyl 3.5L Petrol                  NaN
1   2020 Land Rover Range Rover Evoque D150 S Auto...   $70,000*   Private Seller Car    3,000 km        SUV    Automatic  4cyl 2.0L Turbo Diesel                  NaN
2                 2011 SsangYong Korando S Manual 2WD    $8,750*      Dealer Used Car  164,834 km        SUV       Manual  4cyl 2.0L Turbo Diesel                  NaN
3              2016 BMW X3 xDrive20d F25 LCI Auto 4x4   $31,000*   Private Seller Car   99,654 km        SUV    Automatic  4cyl 2.0L Turbo Diesel                  NaN
4       2019 Mitsubishi Outlander ES ZL Auto 2WD MY20    $29,580          Dealer Demo        2 km        SUV    Automatic        4cyl 2.4L Petrol                  NaN
5    2012 Mazda CX-5 Grand Touring KE Series Auto AWD   $18,000*   Private Seller Car  116,590 km        SUV    Automatic  4cyl 2.2L Turbo Diesel                  NaN
6                     2020 MG HS Excite Auto FWD MY20    $32,848     New Car In Stock         NaN        SUV    Automatic  4cyl 1.5L Turbo Petrol  Build date Jan 2020
7                  2019 BMW X3 xDrive30i G01 Auto 4x4   $67,800*      Dealer Used Car   10,637 km        SUV    Automatic  4cyl 2.0L Turbo Petrol                  NaN
8              2019 BMW X1 xDrive25i F48 LCI Auto AWD    $56,990      Dealer Used Car    7,203 km        SUV    Automatic  4cyl 2.0L Turbo Petrol                  NaN
9          2019 Jeep Cherokee Trailhawk Auto 4x4 MY19    $50,890          Dealer Demo       10 km        SUV    Automatic        6cyl 3.2L Petrol                  NaN
10              2019 Audi Q2 35 TFSI design Auto MY19    $44,850          Dealer Demo    2,135 km        SUV    Automatic  4cyl 1.4L Turbo Petrol                  NaN
11  2020 Land Rover Range Rover Sport SDV8 HSE Aut...  $162,500*   Private Seller Car       48 km        SUV    Automatic  8cyl 4.4L Turbo Diesel                  NaN
12      2015 Porsche Macan S Diesel 95B Auto AWD MY15   $59,800*      Dealer Used Car   71,926 km        SUV    Automatic  6cyl 3.0L Turbo Diesel                  NaN
13   2018 Mazda CX-5 Akera KF Series Auto i-ACTIV AWD   $39,990*      Dealer Used Car   14,855 km        SUV    Automatic        4cyl 2.5L Petrol                  NaN
14  2019 Mazda CX-5 Maxx Sport KF Series Auto i-AC...   $39,950*      Dealer Used Car    9,592 km        SUV    Automatic  4cyl 2.2L Turbo Diesel                  NaN
15            2019 Mitsubishi ASX LS XD Auto 2WD MY20    $29,685          Dealer Demo      447 km        SUV    Automatic        4cyl 2.0L Petrol                  NaN
16                2012 Audi Q5 TFSI Auto quattro MY12   $22,900*   Private Seller Car   69,518 km        SUV    Automatic  4cyl 2.0L Turbo Petrol                  NaN
17              2013 Subaru XV 2.0i G4X Auto AWD MY13   $16,990*      Dealer Used Car   94,245 km        SUV    Automatic        4cyl 2.0L Petrol                  NaN
18  2019 Mitsubishi Pajero Sport Exceed QF Auto 4x...    $58,880          Dealer Demo    1,755 km        SUV    Automatic  4cyl 2.4L Turbo Diesel                  NaN

并保存data.csv(来自LibreOffice的屏幕截图):

enter image description here