尝试从网页解析信息时获取HTTPError

时间:2016-04-11 16:17:22

标签: python request beautifulsoup

我刚开始学习Python并遇到了这个问题。擅长解析亚马逊的价格并将其打印到控制台。

这是我的代码:

import requests, bs4

def getAmazonPrice(productUrl):
    res = requests.get(productUrl)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    elems = soup.select('#addToCart > a > h5 > div > div.a-column.a-span7.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
    return elems[0].text.strip()


price = getAmazonPrice('http://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_2?ie=UTF8&qid=1460386052&sr=8-2&keywords=python+book')
print('The price is ' + price)

错误讯息:

  

Traceback(最近一次调用最后一次):文件   “D:/Code/Python/Basic/webBrowser-Module.py”,第37行,in       price = getAmazonPrice('http://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_2?ie=UTF8&qid=1460386052&sr=8-2&keywords=python+book')   文件“D:/Code/Python/Basic/webBrowser-Module.py”,第30行,   getAmazonPrice       res.raise_for_status()文件“C:\ Python33 \ lib \ requests \ models.py”,第844行,在raise_for_status中       引发HTTPError(http_error_msg,response = self)requests.exceptions.HTTPError:503服务器错误:服务不可用   对于网址:   http://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_2?ie=UTF8&qid=1460386052&sr=8-2&keywords=python+book

     

使用退出代码1完成处理

1 个答案:

答案 0 :(得分:3)

通过提供User-Agent标头,

假装成真正的浏览器可以解决这个问题:

res = requests.get(productUrl, headers={
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36"
})

您还需要调整CSS选择器。例如,.header-price将获得页面上的所有价格(在这种情况下为非素数和素数)。