我为了获得美国政府部门和机构的索引而感到伤心。该列表的API在2016年被删除。当我为具有字母A的代理商抓取该网站时,代码仅返回列表中第一个代理商的名称。我是Python的新手 - 假设我可能需要编写代码来迭代通过类来检索整个列表。谢谢你的帮助
import requests
from bs4 import BeautifulSoup
import bs4
url = requests.get('https://www.usa.gov/federal-agencies/a') #download webpage with listing A
soup = BeautifulSoup(url.content, 'html.parser') #create beautifulSoup class to parse the page
fed_list_a = soup.find(class_ = "one_column_bullet") #extract class with information required
print(fed_list_a.prettify())
url_list = fed_list_a.find(class_="url").get_text()
print (url_list)
仅返回As列表中的此代理商
AbilityOne Commission
答案 0 :(得分:0)
我不得不使用lxml,因为html.parser在此URL中出错。您需要使用find_all()而不是find(),如下所示:
import requests
from bs4 import BeautifulSoup
url = requests.get('https://www.usa.gov/federal-agencies/a') #download webpage with listing A
soup = BeautifulSoup(url.content, 'lxml') #create beautifulSoup class to parse the page
fed_list_a = soup.find(class_ = "one_column_bullet") #extract class with information required
# print(fed_list_a.prettify())
url_list = fed_list_a.find_all('a', class_="url")
for url in url_list:
print (url.get_text())
输出:
AbilityOne Commission
Access Board
Administration for Children and Families (ACF)
Administration for Community Living
Administration for Native Americans
Administration on Aging
Administration on Intellectual and Developmental Disabilities
Administrative Conference of the United States
Administrative Office of the U.S. Courts
Advisory Council on Historic Preservation
African Development Foundation
....