因此,我正在为youtube播放列表创建网络抓取工具,但我认为我遇到了一个错误,或者只是遇到了我很难理解的问题。
import os
import io
import pandas as pd
from numpy import arange
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
# ----
print("Paste the Youtube playlist's page(URL) here.")
url = input()
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find("div", {"id": "content"})
# - Video Count
a = containers.findAll("td", {"class": "pl-video-title"})
b =(len(a))
total =(b)
d = 0
# Finds titles
for i in range(total):
titles = containers.findAll("td", {"class": "pl-video-title"})
titles_int = (int(d),(titles[d].text))
print (titles_int)
d += 1
# Finds links
links = containers.findAll("a")
for link in links:
print(link.get("href"), link.text[0])
它以前是打印\ n的实际换行符,但是现在尽管没有编码属性,它仍然可以打印。我不明白为什么要这么做,因为实际上没有编码属性。我真正能弄清的是,此行titles_int = (int(d),(titles[d].text))
提示了该行。
使用的链接:https://www.youtube.com/playlist?list=PLOzDu-MXXLliO9fBNZOQTBDddoA3FzZUo 当前输出:
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit
(Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>>
=== RESTART: C:\Users\Trillz\Desktop\Youtube to Phone\Playlist Scraper3.py ===
Paste the Youtube playlist's page(URL) here.
https://www.youtube.com/playlist?list=PLOzDu-MXXLliO9fBNZOQTBDddoA3FzZUo
(0, "\n\n Kina - u're mine (ft. shiloh)\n \n\nby the bootleg boy\n\n\n\n")
(1, '\n\n Kina - get you the moon (ft. Snow)\n \n\nby the bootleg boy\n\n\n\n')
(2, '\n\n FOR YOU\n \n\nby the bootleg boy\n\n\n\n')
(3, '\n\n Kina - Nobody Cares (ft. Shiloh)\n \n\nby the bootleg boy\n\n\n\n')
(4, '\n\n beowulf - savior\n \n\nby the bootleg boy\n\n\n\n')
(5, '\n\n dybredly - you are always wrong (ft. Shiloh)\n \n\nby the bootleg boy\n\n\n\n')
(6, "\n\n Sarcastic Sounds - I Don't Sleep\n \n\nby the bootleg boy\n\n\n\n")
答案 0 :(得分:0)
像这样修改您的代码:
# Finds titles
for i in range(total):
titles = containers.findAll("td", {"class": "pl-video-title"})
print(int(d), titles[d].text)
d += 1
否则,您将创建一个tuple
对象,并且打印元组会导致这种问题。您可以在此答案中看到原因:print tuple beautifully with newline