Question

我想搜索特定的关键字，然后抓取所有视频网址。

我知道我要粘贴的代码不会这样做，但我想展示我所做的。

chrome_path = r"C:\Users\Admin\Documents\chromedriver\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.youtube.com/results?sp=CAISAggBUBQ%253D&q=minecraft")

links = driver.find_elements_by_partial_link_text('/watch')
for link in links:
    links = (links.get_attribute("href"))

如何抓取链接并将其保存到文件中？

Answer 1

这是您的代码，为您提供视频的标题和网址轻松简单:)

lineend = "round"

Answer 2

此脚本使用date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end 2020-01-20 10:22:13 MAN50-C1 570 xx.xx.xx.xx POST jhgfjgfjsd.cloudfront.net /index.php/ctrl-web-team/cms_wysiwyg_images/upload/type/image/key/88fdfad6d6b726871662108fad3d3de3/ 403 https://www.example.com/index.php/ctrl-web-team/cms_page/new/key/c7e419205e2c23c854cd4ea1d741bdad/ Mozilla/5.0%20(X11;%20Linux%20x86_64;%20rv:72.0)%20Gecko/20100101%20Firefox/72.0 SID=gm892fbgp7lep0 - Error Z6p-RiAuq7RA7ryi3iNHD== www.example.com https 16778 0.844 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Error HTTP/2.0 - - 42776 0.844 Error text/html 134 - - 2020-01-20 10:22:17 MAN50-C1 1547 xx.xx.xx.xx POST jhgfjgfjsd.cloudfront.net /index.php/ctrl-web-team/cms_wysiwyg_images/contents/type/image/key/0f36c05043efc3435jj342d6a3071e6a47/ 200 https://www.example.com/index.php/ctrl-web-team/cms_page/new/key/c7e419205e2c23c854cd4ea1d741bdad/ Mozilla/5.0%20(X11;%20Linux%20x86_64;%20rv:72.0)%20Gecko/20100101%20Firefox/72.0 isAjax=true - Miss nu8CAYvBxQnpvMbAhdgdf= www.example.com https 545 0.612 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss HTTP/2.0 - - 42776 0.612 Miss text/html;%20charset=UTF-8 - - -从YouTube结果的第一页中提取结果，并使用urllib解析该页面来打印视频的所有链接（如果您使用的是python 3. *，则安装BeautifulSoup）。

BeautifulSoup4

Answer 3

实际上，你不应该从youtube.com/results中删除结果。在刮取任何网站之前，你必须先检查robots.txt。要了解有关robots.txt的更多信息，请阅读此Wiki页面。

https://en.wikipedia.org/wiki/Robots_exclusion_standard

这是youtube的robots.txt文件。

https://www.youtube.com/robots.txt

您还有其他选择，您可以使用youtube搜索API。

https://developers.google.com/youtube/v3/docs/search/list

如何从YouTube搜索中抓取视频？

3 个答案: