我正在尝试使用BeaurifulSoup刮取包含律师姓名的页面的信息
#importing libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
以下是嵌套在HTML标签中的每个律师姓名的示例
</a>
<div class="person-info search-person-info people-search-person-info">
<div class="col person-name-position">
<a href="https://www.foxrothschild.com/richard-s-caputo/">
Richard S. Caputo
</a>
我尝试使用以下脚本以'a'
作为标签,以“ col person-name-position
”作为类来提取每个律师的姓名。但这似乎不起作用。而是打印出一个空列表。
page=requests.get("https://www.foxrothschild.com/people/?search%5Bname%5D=&search%5Bkeyword%5D=&search%5Boffice%5D=&search%5Bpeople-position%5D=&search%5Bpeople-bar-admission%5D=&search%5Bpeople-language%5D=&search%5Bpeople-school%5D=Villanova+University+School+of+Law&search%5Bpractice-area%5D=") #insert page here
soup=BeautifulSoup(page.content,'html.parser')
#print(soup.prettify())
find_name=soup.find_all('a',class_='col person-name-position')
print(find_name)
答案 0 :(得分:1)
您需要将soup.find_all更改为div
,因为该类使用div
而不是a
page=requests.get("https://www.foxrothschild.com/people/search%5Bname%5D=&search%5Bkeywod%5D=&search%5Boffice%5D=&search%5Bpeople-position%5D=&search%5Bpeople-bar-admission%5D=&search%5Bpeople-language%5D=&search%5Bpeople-school%5D=Villanova+University+School+of+Law&search%5Bpractice-area%5D=")
#insert page here
soup=BeautifulSoup(page.content,'html.parser')
#print(soup.prettify())
find_name=soup.find_all('div',class_='col person-name-position')
print(find_name)
答案 1 :(得分:0)
class="col person-name-position"
是div
对象的属性,因此您需要使用:
find_name=soup.find_all('div',class_='col person-name-position')
for entry in find_name:
a_element = entry.find("a")
#...