Python 3打印\ n而不是在Web scraper中换行

时间:2018-09-20 08:10:22

标签: python-3.x web-scraping beautifulsoup

因此,我正在为youtube播放列表创建网络抓取工具,但我认为我遇到了一个错误,或者只是遇到了我很难理解的问题。

import os
import io
import pandas as pd
from numpy import arange
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

# ----
print("Paste the Youtube playlist's page(URL) here.")
url = input()

uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

containers = page_soup.find("div", {"id": "content"})

# - Video Count
a = containers.findAll("td", {"class": "pl-video-title"})
b =(len(a))
total =(b)
d = 0

# Finds titles
for i in range(total):
    titles = containers.findAll("td", {"class": "pl-video-title"})
    titles_int = (int(d),(titles[d].text))
    print (titles_int)
    d += 1

# Finds links
links = containers.findAll("a")
for link in links:
    print(link.get("href"), link.text[0])

它以前是打印\ n的实际换行符,但是现在尽管没有编码属性,它仍然可以打印。我不明白为什么要这么做,因为实际上没有编码属性。我真正能弄清的是,此行titles_int = (int(d),(titles[d].text))提示了该行。

使用的链接:https://www.youtube.com/playlist?list=PLOzDu-MXXLliO9fBNZOQTBDddoA3FzZUo 当前输出:

Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit 
(Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> 
=== RESTART: C:\Users\Trillz\Desktop\Youtube to Phone\Playlist Scraper3.py ===
Paste the Youtube playlist's page(URL) here.
https://www.youtube.com/playlist?list=PLOzDu-MXXLliO9fBNZOQTBDddoA3FzZUo
(0, "\n\n      Kina - u're mine (ft. shiloh)\n    \n\nby the bootleg boy\n\n\n\n")
(1, '\n\n      Kina - get you the moon (ft. Snow)\n    \n\nby the bootleg boy\n\n\n\n')
(2, '\n\n      FOR YOU\n    \n\nby the bootleg boy\n\n\n\n')
(3, '\n\n      Kina - Nobody Cares (ft. Shiloh)\n    \n\nby the bootleg boy\n\n\n\n')
(4, '\n\n      beowulf - savior\n    \n\nby the bootleg boy\n\n\n\n')
(5, '\n\n      dybredly - you are always wrong (ft. Shiloh)\n    \n\nby the bootleg boy\n\n\n\n')
(6, "\n\n      Sarcastic Sounds - I Don't Sleep\n    \n\nby the bootleg boy\n\n\n\n")

1 个答案:

答案 0 :(得分:0)

像这样修改您的代码:

# Finds titles
for i in range(total):
    titles = containers.findAll("td", {"class": "pl-video-title"})
    print(int(d), titles[d].text)
    d += 1

否则,您将创建一个tuple对象,并且打印元组会导致这种问题。您可以在此答案中看到原因:print tuple beautifully with newline