我很难抓取这个网页 top-programming-guru。
我希望检索页面中列出的所有 YouTube 频道的列表。
我正在使用 BeautifulSoup
,我查看了页面的源代码,然后尝试使用以下代码:
import requests
from bs4 import BeautifulSoup
URL = 'https://noonies.tech/award/top-programming-guru'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'lxml')
resluts = soup.find_all('div', class_='sc-jhAzac dldLgq')
resluts
但我总是得到一个空列表。
任何想法如何正确地做到这一点?
这是我要找的标签
<div class="sc-jhAzac dldLgq">
<p>
?
<em>And the winner is...</em>
</p>
<div class="sc-gZMcBi kTYIfA">
<div class="nomination-info">
<h3><i class="fad fa-trophy"></i><a href="https://www.youtube.com/c/programmingwithmosh/videos" target="_blank">Programming with Mosh</a></h3>
答案 0 :(得分:1)
数据是动态加载的。使用 selenium 或类似工具允许 javascript 加载然后抓取。
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
url = 'https://noonies.tech/award/top-programming-guru'
driver = webdriver.Chrome('chromedriver.exe', options=chrome_options)
driver.get(url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
soup.find_all(href=re.compile('youtube.com'))
输出包含 youtube.com 的 href 列表。如果列表捕获了您不想要的 youtube.com 链接或返回到您的课程搜索,您可能需要清理该列表。
[<a href="https://www.youtube.com/c/programmingwithmosh/videos" target="_blank">Programming with Mosh</a>,
<a href="https://www.youtube.com/user/TechGuyWeb" target="_blank">Traversy Media</a>,
<a href="https://www.youtube.com/user/schafer5" target="_blank">Corey Schafer</a>,
<a href="https://m.youtube.com/channel/UC4JX40jDee_tINbkjycV4Sg" target="_blank">Tech With Tim</a>,
<a href="https://www.youtube.com/user/krishnaik06/playlists" target="_blank">Krish Naik</a>,
<a href="https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ" target="_blank">freeCodeCamp.org</a>,
<a href="https://www.youtube.com/c/HiteshChoudharydotcom" target="_blank">Hitesh Choudhary</a>,
<a href="https://m.youtube.com/cleverprogrammer?uid=qrILQNl5Ed9Dz6CGMyvMTQ" target="_blank">Clever Programmer</a>,
<a href="https://www.youtube.com/user/CalebTheVideoMaker2" target="_blank">Caleb Curry</a>,....