刮取时提取列表值

时间:2020-03-11 01:14:46

标签: python beautifulsoup

我正在浏览https://www.nps.gov/index.htm,并尝试创建一个字典,其中下拉菜单中的状态名称是键,值是指向包含该状态信息的相应页面的链接。

但是,使用我当前的代码,我得到的是这样的东西:

<li><a href="/state/wy/index.htm">Wyoming</a></li>

以我目前的技能水平,我不知道如何提取状态名称,因为它没有任何标识符,类或任何对的东西?

那么我将如何实现这一目标?这是我当前的代码:

state_dict = {}

url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)

for state in state_search:
    print(state)

2 个答案:

答案 0 :(得分:4)

您可以使用.text属性,就像这样:

import requests
from bs4 import BeautifulSoup

state_dict = {}

url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)

for state in state_search:
    print(state.text)

它将仅打印文本:

Alabama
Alaska
American Samoa
Arizona
Arkansas
...

答案 1 :(得分:0)

...

for state in state_search:
    for link in state.find_all('a'):
        print("%30s ===> %s" % (link.text, link.get('href')))