Python Crawler:处理“加载更多”按钮

时间:2017-06-05 19:37:06

标签: python web-crawler

我正在学习python crawler,我想知道如何处理位于以下网址中的“加载更多”按钮:

https://www.photo.net/search/#//Sort-View-Count/All-Categories/All-Time/Page-1

(我试图抓住所有图片)

我目前的代码是使用beautifulsoup:

from urllib.request import *

from http.cookiejar import CookieJar

from bs4 import BeautifulSoup

url = 'https://www.photo.net/search/#//Sort-View-Count/All-Categories/All- Time/Page-1'

cj = CookieJar()

opener = build_opener(HTTPCookieProcessor(cj))

try:
    p = opener.open(url)

    soup = BeautifulSoup(p, 'html.parser')

except Exception as e:

    print(str(e))

1 个答案:

答案 0 :(得分:0)

好吧,我有一个解决方案。

您应该尝试使用Selenium模块进行python。

1)下载Chrome驱动程序

2)通过pip安装Selenium

以下是如何使用它的示例

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

browser = webdriver.Chrome('Path to chrome driver')
browser.get()
while True:
    button = WebDriverWait(browser,10).until(EC.presence_of_element_located((By.LINK_TEXT, 'Load More')))
    button.click()