我正在使用以下代码抓取youtube搜索结果:
import requests
from bs4 import BeautifulSoup
url = "https://www.youtube.com/results?search_query=python"
response = requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
for each in soup.find_all("a", class_="yt-simple-endpoint style-scope ytd-video-renderer"):
print(each.get('href'))
但是它什么也没返回。此代码有什么问题?
答案 0 :(得分:0)
BeatifulSoup不是YouTube剪贴的正确工具-Youtube正在使用JavaScript生成大量内容。
您可以轻松对其进行测试:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> url = "https://www.youtube.com/results?search_query=python"
>>> response = requests.get(url)
>>> soup = BeautifulSoup(response.content,'html.parser')
>>> soup.find_all("a")
[<a href="//www.youtube.com/yt/about/en-GB/" slot="guide-links-primary" style="display: none;">About</a>, <a href="//www.youtube.com/yt/press/en-GB/" slot="guide-links-primary" style="display: none;">Press</a>, <a href="//www.youtube.com/yt/copyright/en-GB/" slot="guide-links-primary" style="display: none;">Copyright</a>, <a href="/t/contact_us" slot="guide-links-primary" style="display: none;">Contact us</a>, <a href="//www.youtube.com/yt/creators/en-GB/" slot="guide-links-primary" style="display: none;">Creators</a>, <a href="//www.youtube.com/yt/advertise/en-GB/" slot="guide-links-primary" style="display: none;">Advertise</a>, <a href="//www.youtube.com/yt/dev/en-GB/" slot="guide-links-primary" style="display: none;">Developers</a>, <a href="/t/terms" slot="guide-links-secondary" style="display: none;">Terms</a>, <a href="https://www.google.co.uk/intl/en-GB/policies/privacy/" slot="guide-links-secondary" style="display: none;">Privacy</a>, <a href="//www.youtube.com/yt/policyandsafety/en-GB/" slot="guide-links-secondary" style="display: none;">Policy and Safety</a>, <a href="/new" slot="guide-links-secondary" style="display: none;">Test new features</a>]
(请注意,您在屏幕快照上看到的链接未显示在列表中)
您需要为此使用其他解决方案-硒可能是一个不错的选择。请查看此线程以了解详细信息Fetch all href link using selenium in python