Beautiful Soup 4(YouTube播放列表抓取工具)未收集所有视频

时间:2018-09-21 17:08:33

标签: html web-scraping youtube beautifulsoup scrapy

我一直在努力制作一个收集视频标题和url并将其存储在文本文档中的网络爬虫。但是,我遇到了YouTube如何加载其视频的问题。它一次加载100个视频,然后在加载下一组/页面之前需要输入(一直滚动到底部)。从所有的研究工作来看,我似乎需要使用另一个模块(例如scrapy)完全重新编写代码。这是我当前的脚本:

import os
import io
from selenium import webdriver
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq


#----
print ("Paste the Youtube playlist's page(URL) here.")

url = input()

uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find("div", {"id": "content"})


#- Video Count
a = containers.findAll("td", {"class": "pl-video-title"})
b =(len(a))
total =(b)
d = 0


for i in range(total):
    with open ("songlist.txt", "a") as f:
        titles = containers.findAll("td", {"class": "pl-video-title"})
        print(int(d), titles[d].text)
        titles_out = (int(d), titles[d].text.encode("utf-8"))
        f.write(repr(titles_out))
        d += 1


links = containers.findAll("a")
for link in links:
    with open ("linklist.txt", "a") as f:
        print (link.get("href"), link.text[0:-1])
        links_out = (link.get("href"), link.text[0:-1].encode("utf-8"))
        f.write (repr(links_out))


#----
print ("Press enter to end.")
input()

当前,此脚本将生成任何播放列表的前100个视频的链接和标题,但之后不会显示。在放弃并重新开始之前,我正在寻找任何其他解决方案。

0 个答案:

没有答案