使用 Beautifulsoup 获取标签和这些标签的属性

时间:2021-03-10 19:13:34

标签: python beautifulsoup python-requests

我刚开始使用 beautifulsoup 并且遇到了一个关于在其他标签中获取标签属性的问题。我正在使用 whitehouse.gov/briefing-room/ 进行练习。我现在要做的就是获取此页面上的所有链接并将它们附加到一个空列表中。这是我现在的代码:

    result = requests.get("https://www.whitehouse.gov/briefing-room/")

    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    urls = []

    for h2_tags in soup.find_all('h2'):
        a_tag = h2_tags.find('a')
        urls.append(a_tag.attr['href']) # This is where I get the NoneType error

此代码返回

1 个答案:

答案 0 :(得分:1)

问题是,某些 <h2> 标签不包含 <a> 标签。因此,您必须检查该替代方案。或者只是使用 CSS 选择器选择 <a> 下的所有 <h2> 标签:

import requests
from bs4 import BeautifulSoup


result = requests.get("https://www.whitehouse.gov/briefing-room/")

src = result.content
soup = BeautifulSoup(src, 'lxml')

urls = []

for a_tag in soup.select('h2 a'):    # <-- select <A> tags that are under <H2> tags
    urls.append(a_tag.attrs['href'])

print(*urls, sep='\n')

打印:

https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/10/statement-by-nsc-spokesperson-emily-horne-on-national-security-advisor-jake-sullivan-leading-the-first-virtual-meeting-of-the-u-s-israel-strategic-consultative-group/
https://www.whitehouse.gov/briefing-room/press-briefings/2021/03/09/press-briefing-by-press-secretary-jen-psaki-and-deputy-director-of-the-national-economic-council-bharat-ramamurti-march-9-2021/
https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/readout-of-the-white-houses-meeting-with-climate-finance-leaders/
https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/readout-of-vice-president-kamala-harris-call-with-prime-minister-erna-solberg-of-norway/
https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/nomination-sent-to-the-senate-3/
https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/president-biden-announces-key-hire-for-the-office-of-management-and-budget/
https://www.whitehouse.gov/briefing-room/speeches-remarks/2021/03/09/remarks-by-president-biden-during-tour-of-w-s-jenks-son/
https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/president-joseph-r-biden-jr-approves-louisiana-disaster-declaration/
https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/statement-by-president-joe-biden-on-the-house-taking-up-the-pro-act/
https://www.whitehouse.gov/briefing-room/statements-releases/2021/03/09/white-house-announces-additional-staff/
相关问题