Question

我正在尝试从长度超过2000项的列表中选择链接。最后，我希望能够跟随列表中的链接并打开下一页。我可以在我想要的li列表中打印漂亮的汤，但是我不知道如何遵循这些链接。在下面的代码末尾，我尝试添加此代码：

for link in RHAS:
    print(link.get('href'))

但我收到此错误：

AttributeError：“ NavigableString”对象没有属性“ get”

我认为这与HTML仍附加在代码上有关（即，当我打印li时，代码中会显示a，li和HREF标签）。如何获取链接链接？

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


# The website I am starting at
my_url = 'https://mars.nasa.gov/msl/multimedia/raw/'

#calls the urlopen function from the request module of the urllib module
#AKA opens up the connection and grabs the page
uClient = uReq(my_url)

#imports the webpage from html format into python.  
page_html = uClient.read()

#closes the client
uClient.close()

#parses the HTML using bs4
page_soup = soup(page_html, "lxml")

#finds the categories for the types of images on the site, category 1 is 
#RHAZ
containers = page_soup.findAll("div", {"class": "image_list"})

RHAZ = containers[1]  

# prints the li list that has the links I want
for child in RHAZ:
    print(child)

Answer 1

子节点中包含所有div, ul, li, a标记，这就是您收到错误的原因。

如果您想从所有锚标签中获取href，请找到所有锚标签并从中提取href，如下所示。

for link in RHAZ.findAll('a'):
    print(link['href'])
    print(link['href'], link.text) # if you need both href and text

P.S .：您无需说明错误并在此之后说明您的情况，而是可以说明您正在处理的情况，然后显示您面临的错误。这将更加清楚，您将轻松获得适当的响应。

从beautifulsoup中的列表中选择链接

1 个答案: