我正在浏览https://www.nps.gov/index.htm,并尝试创建一个字典,其中下拉菜单中的状态名称是键,值是指向包含该状态信息的相应页面的链接。
但是,使用我当前的代码,我得到的是这样的东西:
<li><a href="/state/wy/index.htm">Wyoming</a></li>
以我目前的技能水平,我不知道如何提取状态名称,因为它没有任何标识符,类或任何对的东西?
那么我将如何实现这一目标?这是我当前的代码:
state_dict = {}
url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)
for state in state_search:
print(state)
答案 0 :(得分:4)
您可以使用.text
属性,就像这样:
import requests
from bs4 import BeautifulSoup
state_dict = {}
url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)
for state in state_search:
print(state.text)
它将仅打印文本:
Alabama
Alaska
American Samoa
Arizona
Arkansas
...
答案 1 :(得分:0)
...
for state in state_search:
for link in state.find_all('a'):
print("%30s ===> %s" % (link.text, link.get('href')))