如何将数据抓取到特定页面?

时间:2019-07-25 10:14:12

标签: python web-scraping selenium-chromedriver

我不知道网站页面的长度,因此我需要将数据抓取到特定页面。我已经编写了代码,但是代码正在抓取数据,直到网站的最后一页。我想将页面分为所需的页面。 网站= https://worldwide.espacenet.com/classification# 我需要访问网站https://worldwide.espacenet.com/classification#!/CPC=A99Z99/00

的此页面
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import bs4
import requests
import time
import itertools
import json
from pymongo import MongoClient
import pandas as pd
client = MongoClient()
db = client.espace1
espace1 = db.espacedata1

driver = webdriver.Chrome(executable_path = r'C:\Program Files 
(x86)\Google\Chrome\Application\chromedriver_win32 (2)\chromedriver.exe')
url = ('https://worldwide.espacenet.com/classification')
driver.get(url)
time.sleep(10)

div_class_tree = driver.find_element_by_class_name('class-tree')
div_ul = div_class_tree.find_element_by_tag_name('ul')
div_lis = div_ul.find_elements_by_tag_name('li')
for div_li in div_lis:
pass #print (div_li)
span_nav_next = driver.find_element_by_class_name('cpcbrowser-nav-next')
span_nav_next.click()
time.sleep(10)
while True:
div_class_tree =driver.find_element_by_class_name('class-tree')
div_ul = div_class_tree.find_element_by_tag_name('ul')
div_li = div_ul.find_element_by_tag_name('li')
div_li_ul = div_li.find_element_by_tag_name('ul')
div_li_ul_lis = div_li_ul.find_elements_by_tag_name('li')
data = []
for div_ul_li in div_li_ul_lis:
    data.append(div_ul_li.text)
    df = pd.DataFrame(data)
    records = df.to_dict()
print(records)

span_nav_next = driver.find_element_by_class_name('cpcbrowser-nav-next')
span_nav_next.click()
time.sleep(10)
    results = espace1.insert_many([{'records':data}])

0 个答案:

没有答案