我使用各种网站进行网页抓取和原型设计相对较新。我在抓取似乎是Javascript加载表的困难。任何帮助将非常感激。以下是我的代码:
import requests
from bs4 import BeautifulSoup
url='https://onlineservice.cvo.org/webs/cvo/register/#/search/
toronto/0/1/0/10'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
tables = soup.find_all(class_='table')
print(tables)
答案 0 :(得分:1)
尝试以下网址,以眨眼间获取所有信息。您可以在网络选项卡下的xhr请求中使用chrome dev工具检索该网址。试一试:
import requests
URL = 'https://onlineservice.cvo.org/rest/public/registrant/search/?query=%20toronto&status=0&type=1&skip=0&take=427'
response = requests.get(URL)
for items in response.json()['result']:
lastname = items['lastName']
firstname = items['firstName']
middlename = items['middleName']
commonname = items['commonName']
status = items['registrationStatus']['name']
print(lastname,firstname,middlename,commonname,status)
部分结果:
Ackerman Kent Alan Kent Active
Albarracin Oscar Fernando Oscar Active
Alcock Kathleen Kathleen Active
Ali Karissa Soraiya Karissa Active
Allen John Kyle John K. Active
Alvarez Luisa Cristina Luisa Active