通过bs4进行网页抓取时出现问题

时间:2020-06-09 11:34:57

标签: python web-scraping beautifulsoup

我写了一些代码来抓取汽车信息,例如-从title, make, model, transmission, year and price提取ebay.com数据,一切正常,但'transmission'部分却很少转换为具有相同地址的'options'作为传输,这有时会导致代码无法正常工作。

我只想要自动或手动传输,我尝试了一些'if'来解决此问题,但是没有用。

我的代码:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.ebay.com/b/Cars-Trucks/6001?_fsrp=0&_sacat=6001&LH_BIN=1&LH_ItemCondition=3000%7C1000%7C2500&rt=nc&_stpos=95125&Model%2520Year=2020%7C2019%7C2018%7C2017%7C2016%7C2015'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
ebay_cars = soup.find_all('li', class_='s-item')
for car_info in ebay_cars:
    title_div = car_info.find('div', class_='s-item__wrapper clearfix')
    title_sub_div = title_div.find('div', class_='s-item__info clearfix')
    title_p = title_sub_div.find('span', class_='s-item__price')
    title_tag = title_sub_div.find('a', class_='s-item__link')
    title_maker = title_sub_div.find('span', class_='s-item__dynamic s- 
 item__dynamicAttributes1')
    title_model = title_sub_div.find('span', class_='s-item__dynamic s- 
 item__dynamicAttributes2')
    title_trans = title_sub_div.find('span', class_='s-item__dynamic s- 
 item__dynamicAttributes3')



name_of_car = re.sub(r'\d{4}', '', title_tag.text)
maker_of_car = re.sub(r'Make: ','', title_maker.text)
model_of_car = re.sub(r'Model: ', '', title_model.text)
try:
   trans_of_car = re.sub(r'Transmission: ', '', title_trans.text)
except:
   trans_of_car = ''

year_of_car = re.findall(r'\d{4}', title_tag.text)
year_of_car = ''.join(str(x) for x in year_of_car)

price_of_car = title_p.text
print(trans_of_car )

输出:

Automatic
Manual
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Automatic
Options: 4-Wheel Drive

'Options: 4-Wheel Drive'是我的问题。

1 个答案:

答案 0 :(得分:0)

更新了您的try-except代码块:

try:
    if title_trans.text.startswith(r'Transmission: '):
        trans_of_car = re.sub(r'Transmission: ', '', title_trans.text)
    else:
        trans_of_car = ''
except AttributeError:
    trans_of_car = ''