Question

import scrapy
from scrapy.http import TextResponse
from selenium import webdriver

class Spider1(scrapy.Spider):
    name = "len"
    allowed_domains = ["support.lenovo.com"]
    start_urls = ["https://support.lenovo.com/in/hi/contactus1/findaprovider/service-provider-list?countrycode=in"]

def parse(self, response):
    for sel in 
    driver = webdriver.Firefox()
    driver.get(self.start_urls)
    d1 = driver.page_source.encode('utf-8')
    html = str(d1)
    response = TextResponse('none',200,{},html,[],None)
    url = driver.current_url

    elem = driver.find_element_by_class_name("page-next")
    elem.click()
    self.fun(url)

def fun(url): #function to extract data on each page

我试图提取所有50页的详细信息..我通过使用scrapy提取一页数据的代码，但我想提取所有50页。我知道这可以通过使用硒来完成.. 谁能告诉我逻辑或回复我的一些例子，以便我能理解如何从所有页面中提取数据..

这是链接.. https://support.lenovo.com/in/hi/contactus1/findaprovider/service-provider-list?countrycode=in

Answer 1

您不需要使用Selenium，您可以通过此链接访问数据，而无需使用javascript：

https://support.lenovo.com/services/in/hi/serviceproviderlist/getlist/7c0c7e28-bfe3-475a-92af-1ab7bb055515?dataSource=a16fe443-173c-43b7-b5b6-1d86adb8774a&selectedCountry=in&bySortOrder=Ascending&BySortField=State&pageNumber=1

只需增加页码：

SpringConfig

如何通过python中的selenium scrapy webdriver提取所有下一页数据

1 个答案: