我正在尝试从亚马逊上砍价,并使用请求和BeautifulSoup4。脚本中的代码段如下
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
amazon_url = "https://www.amazon.com/gp/offer-listing/B076KDY7VF/ref=dp_olp_new_mbc?ie=UTF8&condition=new"
r = requests.get(url=amazon_url, headers=headers)
page_text = r.text
soup = BeautifulSoup(page_text, "html.parser")
# Finding the Price Table
table = soup.find(id="olpOfferListColumn")
print(table)
该表始终输出为无。我不确定是什么问题。请解释。
答案 0 :(得分:0)
问题是html.parser
无法将未关闭的标签识别/处理为实际标签。如果执行soup.div
,您将看到它仅包含一个div标签。唯一在源代码中带有结束标记的代码。如果您使用lxml
解析器,它将添加结束标记,您的代码将起作用。
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
amazon_url = "https://www.amazon.com/gp/offer-listing/B076KDY7VF/ref=dp_olp_new_mbc?ie=UTF8&condition=new"
r = requests.get(url=amazon_url, headers=headers)
page_text = r.text
soup = BeautifulSoup(page_text, "lxml")
# Finding the Price Table
table = soup.find(id="olpOfferListColumn")
print(table)