如何获得soup.find_all在BeautifulSoup中工作?

时间:2018-08-07 15:07:56

标签: python-3.x web-scraping beautifulsoup

我正在尝试使用BeaurifulSoup刮取包含律师姓名的页面的信息

#importing libraries
from urllib.request import urlopen 
from bs4 import BeautifulSoup
import requests

以下是嵌套在HTML标签中的每个律师姓名的示例

 </a>
          <div class="person-info search-person-info people-search-person-info">
           <div class="col person-name-position">
            <a href="https://www.foxrothschild.com/richard-s-caputo/">
             Richard S. Caputo
            </a>

我尝试使用以下脚本以'a'作为标签,以“ col person-name-position”作为类来提取每个律师的姓名。但这似乎不起作用。而是打印出一个空列表。

page=requests.get("https://www.foxrothschild.com/people/?search%5Bname%5D=&search%5Bkeyword%5D=&search%5Boffice%5D=&search%5Bpeople-position%5D=&search%5Bpeople-bar-admission%5D=&search%5Bpeople-language%5D=&search%5Bpeople-school%5D=Villanova+University+School+of+Law&search%5Bpractice-area%5D=") #insert page here
soup=BeautifulSoup(page.content,'html.parser')
#print(soup.prettify())
find_name=soup.find_all('a',class_='col person-name-position')
print(find_name)

2 个答案:

答案 0 :(得分:1)

您需要将soup.find_all更改为div,因为该类使用div而不是a

page=requests.get("https://www.foxrothschild.com/people/search%5Bname%5D=&search%5Bkeywod%5D=&search%5Boffice%5D=&search%5Bpeople-position%5D=&search%5Bpeople-bar-admission%5D=&search%5Bpeople-language%5D=&search%5Bpeople-school%5D=Villanova+University+School+of+Law&search%5Bpractice-area%5D=") 
#insert page here
soup=BeautifulSoup(page.content,'html.parser')
#print(soup.prettify())
find_name=soup.find_all('div',class_='col person-name-position')
print(find_name)

答案 1 :(得分:0)

class="col person-name-position"div对象的属性,因此您需要使用:

find_name=soup.find_all('div',class_='col person-name-position')
for entry in find_name:
    a_element = entry.find("a")
    #...