使用BeautifulSoup获取产品ID,品牌名称和图像时,在我的代码中出现问题

时间:2018-06-06 04:19:09

标签: web-scraping beautifulsoup python-3.6

我正在尝试使用以下代码从sample product url获取产品详细信息 -

def get_soup(url):
soup = None
try:
    response = requests.get(url)
    if response.status_code == 200:
        html = response.content
        soup = BeautifulSoup(html, "html.parser")
except Exception as exc:
    print("Unable to fecth data due to..", str(exc))
finally:
    return soup

def get_product_details(url):
soup = get_soup(url)
sleep(1)
try:
    product_shop = soup.find('div', attrs={"class": "buy"})
    if product_shop is not None:
        available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
        if available_product_shop is not None:
            prod_details = dict()
            merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3].text
            if merchant_product_id is not None:
                prod_details['merchant_product_id'] = merchant_product_id
                check_brand = soup.find('div', attrs={'class': 'description'}).findAll('span')[2].find('a')
                if check_brand is not None:
                    prod_details['brand'] = check_brand.text
                prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
                                                                                     soup.find('div', attrs={
                                                                                         'class': 'left'}).findAll(
                                                                                         'a')))))
                check_price = soup.find('span', attrs={"class": "price-old"})
                if check_price is not None:
                    prod_details['price'] = check_price.text.replace("SGD $", "")
                check_sale_price = soup.find('span', attrs={"class": "price-new"})
                if check_sale_price is not None:
                    prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
                return prod_details
except Exception as exc:
    print("Error..", str(exc))

上述代码中的问题是我无法获得品牌价值,产品ID和图片网址也未正确获取。

任何人都可以查看我的代码并帮助我获取正确的详细信息吗?

1 个答案:

答案 0 :(得分:2)

好的,我回答你问题的方法是重构,简化和修复代码。针对特定元素有很多改进。它更清洁,更容易理解。请随时向我询问您不理解的细节。祝你的项目好运(:

<强>代码:

import re

import requests
from bs4 import BeautifulSoup


def get_product_details(url):
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'lxml')

    if soup.select_one('.stock').text != 'In Stock':
        return

    product_code_caption = soup.find('span', string=re.compile('Product Code:'))
    product_code = product_code_caption.next_sibling.strip()

    brand_container = soup.find('span', string=re.compile('Brand:'))
    brand = brand_container.find_next_sibling('a').string

    urls = [a['href'] for a in soup.select('.cloud-zoom-gallery')]

    old_price = soup.select_one('.price-old').text.replace('SGD $', '')
    new_price = soup.select_one('.price-new').text.replace('SGD $', '')

    prod_details = {
        'merchant_product_id': product_code,
        'brand': brand,
        'merchant_image_urls': urls,
        'price': old_price,
        'sale_price': new_price
    }

    return prod_details


import pprint
pprint.pprint(get_product_details('http://www.infantree.net/shop/index.php?route=product/product&path=59_113&product_id=1070'))

<强>输出:

{'brand': 'Britax',
 'merchant_image_urls': ['http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Britax-Light-Travel-System_BlackThunder-683x1024-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Formula-One-Flame-Red1024x1024-510x510-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Formula-One-Cosmos-Black1024x1024-768x768-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Black-Thunder-Ocean-Blue1024x1024-768x768-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Black-Thunder-Flame-Red1024x1024-768x768-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Black-Thunder-Cosmos-Black1024x1024-768x768-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Formaula-One-Ocean-Blue1024x1024-510x510-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Olympian-Blue-Cosmos-Black1024x1024-510x510-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Olympian-Blue-Flame-Red1024x1024-768x768-500x500.jpg',
                         'http://www.infantree.net/shop/image/cache/data/Britax '
                         'Products/Olympian-Blue-Ocean-Blue1024x1024-100x100-500x500.jpg'],
 'merchant_product_id': 'BRITAX Light + i-Size Travel System',
 'price': '1,032.00',
 'sale_price': '699.00'}