我一直在努力制作一个收集视频标题和url并将其存储在文本文档中的网络爬虫。但是,我遇到了YouTube如何加载其视频的问题。它一次加载100个视频,然后在加载下一组/页面之前需要输入(一直滚动到底部)。从所有的研究工作来看,我似乎需要使用另一个模块(例如scrapy)完全重新编写代码。这是我当前的脚本:
import os
import io
from selenium import webdriver
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
#----
print ("Paste the Youtube playlist's page(URL) here.")
url = input()
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find("div", {"id": "content"})
#- Video Count
a = containers.findAll("td", {"class": "pl-video-title"})
b =(len(a))
total =(b)
d = 0
for i in range(total):
with open ("songlist.txt", "a") as f:
titles = containers.findAll("td", {"class": "pl-video-title"})
print(int(d), titles[d].text)
titles_out = (int(d), titles[d].text.encode("utf-8"))
f.write(repr(titles_out))
d += 1
links = containers.findAll("a")
for link in links:
with open ("linklist.txt", "a") as f:
print (link.get("href"), link.text[0:-1])
links_out = (link.get("href"), link.text[0:-1].encode("utf-8"))
f.write (repr(links_out))
#----
print ("Press enter to end.")
input()
当前,此脚本将生成任何播放列表的前100个视频的链接和标题,但之后不会显示。在放弃并重新开始之前,我正在寻找任何其他解决方案。