在尝试抓取youtube主页获取运行此代码的每个视频的标题
import request
from bs4 import BeautifulSoup
url = 'https://www.youtube.com'
html = requests.get(url)
soup = BeautifulSoup(html.content, "html.parser")
print(soup('a'))
及其返回的错误
Traceback (most recent call last):
File "C:\Users\kenda\OneDrive\Desktop\Projects\youtube.py", line 7, in <
<module>
print(soup('a'))
File "C:\Users\kenda\AppData\Local\Programs\Python\Python36-
32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f384' in
position 45442: character maps to <undefined>
[Finished in 4.83s]
我该如何解决?以及为什么我在抓取YouTube时专门这样做
答案 0 :(得分:1)
Urllib更好,并且使用起来舒适。
from urllib.request import urlopen
from bs4 import BeautifulSoup
urlopen函数会将url转换为html
url = 'https://www.youtube.com'
html = urlopen(url)
beautifulsoup将读取html
soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))
如果您绝对要处理请求,则解决方案是:
import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com'
resp = requests.get(url)
html = resp.text
soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))