Question

美丽的汤4的新手，当我在YouTube上搜索某些东西时，我无法获得这个简单的代码来获取标签的内容。当我打印容器时，它只是打印“[]”作为我假设的空变量。有什么想法，为什么这不是什么？这与YouTube上没有抓取正确的标签有关吗？在搜索HTML中，有一个结果的以下标记：

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="Kendrick Lamar - HUMBLE. by KendrickLamarVEVO 5 months ago 3 minutes, 4 seconds 322,571,817 views" href="https://www.youtube.com/watch?v=tvTRZJ-4EyI" title="Kendrick Lamar - HUMBLE.">
                Kendrick Lamar - HUMBLE.
              </a>

Python代码：

import bs4

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

search = "damn"
my_url = "https://www.youtube.com/results?search_query=" + search
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

containers = page_soup.find_all("a",{"id":"video-title"})
print(containers)

#result-count

Answer 1

如果您检查url的源代码，则无法找到任何id="video-title"，这意味着此页面会动态加载内容。 BeautifulSoup不支持动态加载。尝试将其与其他内容（例如selenium或scrapyjs相结合，this post可能会有所帮助

Answer 2

在youtube页面中动态加载结果，以便更改id和类名称。当你尝试解析页面时，确保你在urllib中加载它而不是在浏览器中读取页面源代码看到该代码，它将解决您的问题：

from bs4 import BeautifulSoup as bs
from urllib.request import *
page = urlopen('https://www.youtube.com/results?search_query=damn').read()
soup = bs(page,'html.parser')
results = soup.find_all('a',{'class':'yt-uix-sessionlink'})
for link in results:
    print(l.get("href"))

代码将在页面中显示所有网址，因此您也应该对其进行解析。

为什么find_all BeautifulSoup4函数什么都没有返回？

2 个答案: