如何在Colab中使用Selenium抓取动态内容?

时间:2020-03-09 09:17:58

标签: python selenium google-colaboratory

我正在尝试在Google colab中使用硒来刮擦YouTube视频的评论,但脚本会调用该异常。当我在本地计算机上运行相同的脚本时,它将起作用。 YouTube视频网站是动态的,我发现返回页面源内容未在colab上返回完整内容。但是,我不知道如何解决。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time

options = Options()
prefs = {"profile.managed_default_content_settings.images": 2}
options.add_experimental_option("prefs", prefs)
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--window-size=2560x1440")
options.add_argument("start-maximised")
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://www.youtube.com/watch?v=yIYKR4sgzI8')

time.sleep(10)
driver.execute_script('window.scrollTo(1, 500);')
time.sleep(10)

# views_div = driver.find_element_by_xpath('//*[@id="info-contents"]')
# views = views_div.find_element_by_xpath('//*[@class="view-count style-scope yt-view-count-renderer"]')

comment_div=driver.find_element_by_xpath('//*[@id="contents"]')
comments=comment_div.find_elements_by_xpath('//*[@id="content-text"]')

# titles_div = driver.find_element_by_xpath('//*[@class="detail-content"]')
# titles = titles_div.find_element_by_xpath('//*[@class="title"]')


# print(driver.page_source)
for comment in comments:
    print(comment.text)

driver.close()

然后,我得到这个异常。

NoSuchElementException                    Traceback (most recent call last)
<ipython-input-20-1833cad42017> in <module>()
     26 # views = views_div.find_element_by_xpath('//*[@class="view-count style-scope yt-view-count-renderer"]')
     27 
---> 28 comment_div=driver.find_element_by_xpath('//*[@id="contents"]')
     29 comments=comment_div.find_elements_by_xpath('//*[@id="content-text"]')
     30 

3 frames
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
    240                 alert_text = value['alert'].get('text')
    241             raise exception_class(message, screen, stacktrace, alert_text)
--> 242         raise exception_class(message, screen, stacktrace)
    243 
    244     def _value_or_default(self, obj, key, default):

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="contents"]"}
  (Session info: headless chrome=80.0.3987.87)

0 个答案:

没有答案