Question

我正在尝试从亚马逊上砍价，并使用请求和BeautifulSoup4。脚本中的代码段如下

headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

amazon_url = "https://www.amazon.com/gp/offer-listing/B076KDY7VF/ref=dp_olp_new_mbc?ie=UTF8&condition=new"

r = requests.get(url=amazon_url, headers=headers)
page_text = r.text

soup = BeautifulSoup(page_text, "html.parser")

# Finding the Price Table
table = soup.find(id="olpOfferListColumn")

print(table)

该表始终输出为无。我不确定是什么问题。请解释。

Answer 1

问题是html.parser无法将未关闭的标签识别/处理为实际标签。如果执行soup.div，您将看到它仅包含一个div标签。唯一在源代码中带有结束标记的代码。如果您使用lxml解析器，它将添加结束标记，您的代码将起作用。

headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

amazon_url = "https://www.amazon.com/gp/offer-listing/B076KDY7VF/ref=dp_olp_new_mbc?ie=UTF8&condition=new"

r = requests.get(url=amazon_url, headers=headers)
page_text = r.text

soup = BeautifulSoup(page_text, "lxml")

# Finding the Price Table
table = soup.find(id="olpOfferListColumn")

print(table)

BeautifulSoup无法使用ID查找Div

1 个答案: