Question

我正在尝试编写网络搜索算法，并且在我第一次通过网站时，我称之为beautifulsoup。然后我在它上面使用find_all，它返回一个“a”类的列表。在这个类中，有一组数据，但我正在尝试创建URL列表。这是我的代码：

soupcurrent = BeautifulSoup(html_current, 'html.parser')
search_results = soupcurrent.find_all(["a"], class_="XYZ")

运行此操作后，如何再次削减数据以制作仅列出网址的列表？格式为href =“...”

我已经尝试过使用

     newlist.append(search_results.get('href')

但这没效果。还有其他想法吗？

Answer 1

调用每个代码对象的__getitem__方法并传递'href'：

soupcurrent = BeautifulSoup(html_current, 'lxml')
search_results = soupcurrent.find_all("a", class_="XYZ")
urls = [i['href'] for i in search_results]

编辑：

要提取图片src，请定位img代码，然后访问"src"：

s = '<div class="XYZ" data-imgid="000" id="000"><img alt=" 1" src="XYZ.jpg"; title=" 1"/></div>'
the_src = BeautifulSoup(s, 'lxml').find('img')['src']

输出：

'XYZ.jpg'

Beautifulsoup：从已经从中获取链接的文件中提取链接

1 个答案: