如何获取页面的特定HTML元素(bs4)

时间:2017-11-24 00:32:49

标签: python html

import requests, bs4, webbrowser

url = 'https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords='
keywords = "keyboard"
full_link = url + keywords
res = requests.get(full_link)
soup = bs4.BeautifulSoup(res.text)
webbrowser.open(full_link)

a = soup.find('a', {'class': 'a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal'})
print(a)

嗨,我正在尝试获取一个非常具体的html元素,这个元素深埋在div中,但无济于事。这是HTML:

<a class="a-link-normal s-access-detail-page  s-color-twister-title-link a-text-normal" title="AmazonBasics Wired Keyboard" href="https://rads.stackoverflow.com/amzn/click/com/B005EOWBHC" rel="nofollow noreferrer"><h2 data-attribute="AmazonBasics Wired Keyboard" data-max-rows="0" class="a-size-medium s-inline  s-access-title  a-text-normal">AmazonBasics Wired Keyboard</h2></a>

而且这很深。我想得到这个元素的href,但是目前我的变量a返回None。

1 个答案:

答案 0 :(得分:1)

您需要使用findAll并将类作为数组提供。例如:

a = soup.findAll('a',  {'class': ['a-link-normal', 's-access-detail-page', 's-color-twister-title-link', 'a-text-normal']})

但我也建议不要选择这种特定的课程。唯一真正需要的是可能 s-access-detail-page