如何使用BeautifulSoup刮取youtube评论

时间:2018-04-18 19:49:46

标签: python python-3.x web-scraping beautifulsoup

我是新手。我想知道如何使用BeautifulSoup网页抓取YouTube评论。我很震惊。任何人都可以帮我代码。

这是我写的:

import requests    
from bs4 import BeautifulSoup

r = requests.get("https://www.youtube.com/watch?v=kffacxfA7G4"    
req =r.conten    
soup = BeautifulSoup(req,'html.parser')    
print(soup.prettify())    
all = soup.find_all('div',{'id' : 'contents'})

我被困在这里没有得到任何输出,检查它显示的评论有id =内容

的wb页面

1 个答案:

答案 0 :(得分:0)

该网站的评论是动态生成的。您无法使用requestsBeautifulSoup库来获取它们。要获取内容,您需要使用任何浏览器模拟器,如selenium。作为首发,您可以尝试如下。下面的脚本将无头地运行并获取未解决的注释。顺便说一句,该网站也有活动的延迟加载方法,因此您需要抽取for loop来获取更多内容。

import time
from selenium.webdriver import Chrome
from contextlib import closing
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

chrome_options = Options()  
chrome_options.add_argument("--headless")

with closing(Chrome(chrome_options=chrome_options)) as driver:
    wait = WebDriverWait(driver,10)
    driver.get("https://www.youtube.com/watch?v=kffacxfA7G4")

    for item in range(3): #by increasing the highest range you can get more content
        wait.until(EC.visibility_of_element_located((By.TAG_NAME, "body"))).send_keys(Keys.END)
        time.sleep(3)

    for comment in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#comment #content-text"))):
        print(comment.text)

部分输出:

15 April 2018 ?¿?
April 2018??
8 years people 
Nice songs Justin Bieber https://youtu.be/OvfAc7JGoc4
2018 hit like...♥️♥️♥️♥️
8 years complete 
Can likes beat dislikes??
View 1, 8 billion great song