我是新手。我想知道如何使用BeautifulSoup网页抓取YouTube评论。我很震惊。任何人都可以帮我代码。
这是我写的:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.youtube.com/watch?v=kffacxfA7G4"
req =r.conten
soup = BeautifulSoup(req,'html.parser')
print(soup.prettify())
all = soup.find_all('div',{'id' : 'contents'})
我被困在这里没有得到任何输出,检查它显示的评论有id =内容
的wb页面答案 0 :(得分:0)
该网站的评论是动态生成的。您无法使用requests
和BeautifulSoup
库来获取它们。要获取内容,您需要使用任何浏览器模拟器,如selenium
。作为首发,您可以尝试如下。下面的脚本将无头地运行并获取未解决的注释。顺便说一句,该网站也有活动的延迟加载方法,因此您需要抽取for loop
来获取更多内容。
import time
from selenium.webdriver import Chrome
from contextlib import closing
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
chrome_options = Options()
chrome_options.add_argument("--headless")
with closing(Chrome(chrome_options=chrome_options)) as driver:
wait = WebDriverWait(driver,10)
driver.get("https://www.youtube.com/watch?v=kffacxfA7G4")
for item in range(3): #by increasing the highest range you can get more content
wait.until(EC.visibility_of_element_located((By.TAG_NAME, "body"))).send_keys(Keys.END)
time.sleep(3)
for comment in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#comment #content-text"))):
print(comment.text)
部分输出:
15 April 2018 ?¿?
April 2018??
8 years people
Nice songs Justin Bieber https://youtu.be/OvfAc7JGoc4
2018 hit like...♥️♥️♥️♥️
8 years complete
Can likes beat dislikes??
View 1, 8 billion great song