Question

我是新手。我想知道如何使用BeautifulSoup网页抓取YouTube评论。我很震惊。任何人都可以帮我代码。

这是我写的：

import requests    
from bs4 import BeautifulSoup

r = requests.get("https://www.youtube.com/watch?v=kffacxfA7G4"    
req =r.conten    
soup = BeautifulSoup(req,'html.parser')    
print(soup.prettify())    
all = soup.find_all('div',{'id' : 'contents'})

我被困在这里没有得到任何输出，检查它显示的评论有id =内容

的wb页面

Answer 1

该网站的评论是动态生成的。您无法使用requests和BeautifulSoup库来获取它们。要获取内容，您需要使用任何浏览器模拟器，如selenium。作为首发，您可以尝试如下。下面的脚本将无头地运行并获取未解决的注释。顺便说一句，该网站也有活动的延迟加载方法，因此您需要抽取for loop来获取更多内容。

import time
from selenium.webdriver import Chrome
from contextlib import closing
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

chrome_options = Options()  
chrome_options.add_argument("--headless")

with closing(Chrome(chrome_options=chrome_options)) as driver:
    wait = WebDriverWait(driver,10)
    driver.get("https://www.youtube.com/watch?v=kffacxfA7G4")

    for item in range(3): #by increasing the highest range you can get more content
        wait.until(EC.visibility_of_element_located((By.TAG_NAME, "body"))).send_keys(Keys.END)
        time.sleep(3)

    for comment in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#comment #content-text"))):
        print(comment.text)

部分输出：

15 April 2018 ?¿?
April 2018??
8 years people 
Nice songs Justin Bieber https://youtu.be/OvfAc7JGoc4
2018 hit like...♥️♥️♥️♥️
8 years complete 
Can likes beat dislikes??
View 1, 8 billion great song

如何使用BeautifulSoup刮取youtube评论

1 个答案: