硒自动化可以与BS4一起使用吗?

时间:2020-04-09 06:11:35

标签: python python-3.x selenium web-scraping beautifulsoup

我同时使用selenium进行自动化和抓取。现在,我发现某些网站的速度太慢了。如果我使用beautifulSoup,则可以更快地抓取它们,但是自动化无法完成。

无论如何,我可以在其中使网站自动化(按钮单击事件等),也可以在beautifulSoup上用它抓取网站吗?

您能给我一个bs4 + selenium的按钮/搜索自动化示例吗?

任何帮助将不胜感激...

3 个答案:

答案 0 :(得分:1)

示例

from bs4 import BeautifulSoup as Soup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://stackoverflow.com/questions/tagged/beautifulsoup+selenium")
page = Soup(driver.page_source, features='html.parser')
questions = page.select("#questions h3 a[href]")

for question in questions:
    print(question.text.strip())

或者只是

import requests
from bs4 import BeautifulSoup as Soup


url = 'https://stackoverflow.com/questions/tagged/beautifulsoup+selenium'
response = requests.get(url=url)
page = Soup(response.text, features='html.parser')
questions = page.select("#questions h3 a[href]")

for question in questions:
    print(question.text.strip())

记住要阅读https://stackoverflow.com/robots.txt

答案 1 :(得分:0)

绝对。您可以使用selenium进行所有渲染,并将页面源传递给beautifulsoup,如下所示:

from bs4 import BeautifulSoup as bs
soup = bs(driver.page_source,'html.parser')

答案 2 :(得分:0)

这个如何让它实时DOM并加载js,享受并节省你的搜索时间,想法是得到整个身体,如果你也想要头部替换身体,它会和硒完全一样,我希望你喜欢这一切。

options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
dri = webdriver.Chrome(options=options)
html = dri.find_element_by_tag_name("body").get_attribute('innerHTML')
        soup = BeautifulSoup(html, features="lxml")