我需要抓取此列表中存在的“最佳编码训练营”列表: https://www.switchup.org/rankings/best-coding-bootcamps
我的作业说用Beautiful Soup(而不是Selenium)应该可行,但是当我尝试这样做时,结果HTML不会返回训练营的列表,而是返回类的空元素:
我的问题是,您是否认为仅使用Beautiful Soup可以检索此内容而无需求助于Selenium?如果需要硒,那么这样做的简单代码是什么?
到目前为止的代码非常简单:
from bs4 import BeautifulSoup
import requests
import time
url = "https://www.switchup.org/rankings/best-coding-bootcamps"
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
time.sleep(5)
print(soup)
非常感谢您
答案 0 :(得分:2)
您说对了,您发布的URL上的页面为空。数据是通过AJAX从另一个URL加载的。
如果您在Firefox / Chrome中检查“网络”标签,则可以找到以下URL(数据为JSON格式):
import requests
from bs4 import BeautifulSoup
url = 'https://www.switchup.org/chimera/v1/bootcamp-list?mainTemplate=bootcamp-list%2Frankings&path=%2Frankings%2Fbest-coding-bootcamps&isDataTarget=false&featuredSchools=0&logoTag=logo&logoSize=original&numSchools=0&perPage=0&rankType=BootcampRankings&rankYear=2020&recentReview=true&reviewLength=50&numLocations=5&numSubjects=5&numCourses=5&sortOn=name&withReviews=false'
data = requests.get(url).json()
for i, bootcamp in enumerate(data['content']['bootcamps'], 1):
soup = BeautifulSoup(bootcamp['description'], 'html.parser')
print('{}. {}'.format(i, bootcamp['name']))
print(soup.get_text(strip=True))
print('-' * 80)
打印:
1. Le Wagon
Le Wagon is an intensive international coding bootcamp geared toward career changers and entrepreneurs who want to gain coding skills. Participants complete 450 hours of coding in 9 weeks full-time or 24 weeks part-time, which includes building their own web app. After completing the program, students join an international alumni network of 6,000+ for career support and community.
--------------------------------------------------------------------------------
2. App Academy
App Academy teaches participants everything they need to know about software engineering in just 12 weeks. Their full-time bootcamps have helped over 2,000 graduates find jobs at more than 850 companies. Their deferred tuition plan means participants pay for the program only after they’ve landed their first web development job.
--------------------------------------------------------------------------------
3. Ironhack
Ironhack offers two full-time bootcamps focused on web design, a 26-week program in web development and a nine-week program in user experience and user interface design. Students can access extensive career development services post-graduation including portfolio building and interview practice; scholarships are available for underrepresented populations and veterans.
--------------------------------------------------------------------------------
...and so on.