BeautifulSoup无法使用ID查找Div

时间:2018-09-18 02:05:35

标签: python-3.x web-scraping beautifulsoup

我正在尝试从亚马逊上砍价,并使用请求和BeautifulSoup4。脚本中的代码段如下

headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

amazon_url = "https://www.amazon.com/gp/offer-listing/B076KDY7VF/ref=dp_olp_new_mbc?ie=UTF8&condition=new"

r = requests.get(url=amazon_url, headers=headers)
page_text = r.text

soup = BeautifulSoup(page_text, "html.parser")

# Finding the Price Table
table = soup.find(id="olpOfferListColumn")

print(table)

该表始终输出为。我不确定是什么问题。请解释。

1 个答案:

答案 0 :(得分:0)

问题是html.parser无法将未关闭的标签识别/处理为实际标签。如果执行soup.div,您将看到它仅包含一个div标签。唯一在源代码中带有结束标记的代码。如果您使用lxml解析器,它将添加结束标记,您的代码将起作用。

headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

amazon_url = "https://www.amazon.com/gp/offer-listing/B076KDY7VF/ref=dp_olp_new_mbc?ie=UTF8&condition=new"

r = requests.get(url=amazon_url, headers=headers)
page_text = r.text

soup = BeautifulSoup(page_text, "lxml")

# Finding the Price Table
table = soup.find(id="olpOfferListColumn")

print(table)