Question

我想从以下代码中提取同一div类中的所有链接：

<div class='page-numbers clearfix'><span class='current'>
1</span><a href='https://www.example.com/blog/author/abc/page/2/' class='inactive'>
2</a><a href='https://www.example.com/blog/author/abc/page/3/' class='inactive'>
3</a><a href='https://www.example.com/blog/author/abc/page/4/' class='inactive'>
4</a></div>

我尝试过：

from bs4 import BeautifulSoup

html="<div class='page-numbers clearfix'><span class='current'>1</span><a href='https://www.example.com/blog/author/abc/page/2/' class='inactive'>2</a><a href='https://www.example.com/blog/author/abc/page/3/' class='inactive'>3</a><a href='https://www.example.com/blog/author/abc/page/4/' class='inactive'>4</a></div>
"

soup = BeautifulSoup(html, "html.parser")
for i in soup.find_all('div', {'class': 'page-numbers clearfix'}):
    link= i.find('a', href=True)
    print(link['href'])

但这似乎不起作用。我需要的输出是：

https://www.example.com/blog/author/abc/page/2/

https://www.example.com/blog/author/abc/page/3/

https://www.example.com/blog/author/abc/page/4/

Answer 1

您还必须在找到find_all标签的同时使用a。下面的代码可以正常工作。

from bs4 import BeautifulSoup as bs

stra = """
<div class='page-numbers clearfix'><span class='current'>
1</span><a href='https://www.example.com/blog/author/abc/page/2/' class='inactive'>
2</a><a href='https://www.example.com/blog/author/abc/page/3/' class='inactive'>
3</a><a href='https://www.example.com/blog/author/abc/page/4/' class='inactive'>
4</a></div>
"""
soup = bs(stra, 'html.parser')
for i in soup.find_all('div', {'class': 'page-numbers clearfix'}):
    links = i.find_all('a', href=True)
    for link in links:
        print(link['href'])

输出：

https://www.example.com/blog/author/abc/page/2/
https://www.example.com/blog/author/abc/page/3/
https://www.example.com/blog/author/abc/page/4/

Answer 2

这里所有其他好的答案都有可能（略短）变化：

<class 'int'>
<class 'int'>
<class 'str'>
<class 'list'>

Answer 3

这将为您提供链接列表：

Route::put('/postca', 'CAsController@.....');

from bs4 import BeautifulSoup html_doc = '''<div class='page-numbers clearfix'><span class='current'> 1</span><a href='https://www.example.com/blog/author/abc/page/2/' class='inactive'> 2</a><a href='https://www.example.com/blog/author/abc/page/3/' class='inactive'> 3</a><a href='https://www.example.com/blog/author/abc/page/4/' class='inactive'> 4</a></div>''' soup = BeautifulSoup(html_doc, "lxml") div = soup.find('div', attrs={'class': 'page-numbers clearfix'}) containers = div.find_all('a', attrs={'class': 'inactive'}) links = [c['href'] for c in containers]返回：

links

Answer 4

尝试以下代码。

data='''<div class='page-numbers clearfix'><span class='current'>
1</span><a href='https://www.example.com/blog/author/abc/page/2/' class='inactive'>
2</a><a href='https://www.example.com/blog/author/abc/page/3/' class='inactive'>
3</a><a href='https://www.example.com/blog/author/abc/page/4/' class='inactive'>
4</a></div>'''


soup=BeautifulSoup(data,'html.parser')

item= soup.find('div', class_="page-numbers clearfix")
for item in item.find_all('a', href=True):
    print(item['href'])

输出：

https://www.example.com/blog/author/abc/page/2/
https://www.example.com/blog/author/abc/page/3/
https://www.example.com/blog/author/abc/page/4/

Answer 5

您可以使用CSS选择器：

from bs4 import BeautifulSoup

data = '''<div class='page-numbers clearfix'><span class='current'>
1</span><a href='https://www.example.com/blog/author/abc/page/2/' class='inactive'>
2</a><a href='https://www.example.com/blog/author/abc/page/3/' class='inactive'>
3</a><a href='https://www.example.com/blog/author/abc/page/4/' class='inactive'>
4</a></div>'''

soup = BeautifulSoup(data, 'lxml')

for a in soup.select('div.page-numbers.clearfix a[href]'):
    print(a['href'])

打印：

https://www.example.com/blog/author/abc/page/2/
https://www.example.com/blog/author/abc/page/3/
https://www.example.com/blog/author/abc/page/4/

如何在python的同一类中提取多个链接？

5 个答案: