使用Scrapy-Selenium浏览页面时出现问题

时间:2020-06-23 20:30:58

标签: selenium selenium-webdriver web-scraping scrapy

由于Scrap和Selenium,我正在尝试废弃亚马逊网站,但是在页面之间导航时遇到了问题。我想做的是转到“促销”页面,然后获取所有促销,对于每个促销,获取URL并导航到其中,获取信息,然后对其他人进行相同的操作。在程序结束时,我单击下一步。

实际上,该漫游器进入首页和每个产品页面,但没有找到产品,也没有下一个按钮。有什么想法吗?

这是我的代码:

class AmazonScraper(scrapy.Spider):
    name = 'amazon_scraper'

    def start_requests(self):
        yield SeleniumRequest(url="https://www.amazon.fr/gp/goldbox", wait_time=3, callback=self.parse)

    def parse(self, response):
        product = Product()
        driver = response.meta['driver']
        deal_card_views = response.xpath('//div[starts-with(@id, "100_dealView_")]')

        for card_view in deal_card_views:
            if card_view.xpath('.//span[contains(@id, "shipSoldInfo")]/text()'):
                product['url'] = card_view.xpath('normalize-space(.//a[contains(@id, "dealTitle")]/@href)').get()
                yield SeleniumRequest(url=product['url'], wait_time=3, callback=self.parse_product,
                                      meta={'product': product})
        next_page = driver.find_element_by_partial_link_text('Suivant')

        if next_page:
            next_page.click()
            next_page_url = driver.current_url
            print(next_page_url)
            yield SeleniumRequest(url=next_page_url, wait_time=3,
                                  callback=self.parse)
        else:
            driver.close()

    def parse_product(self, response):
        product = response.meta['product']
        product['category'] = response.xpath(
            'normalize-space(//*[@id="wayfinding-breadcrumbs_feature_div"]/ul/li[1]/span/a/text())').get()
        product['name'] = response.xpath('normalize-space(//*[@id="productTitle"]/text())').get()
        product['time_left'] = response.xpath(
            'normalize-space(//*[starts-with(@id, "deal_expiry_timer")]/text())').get()
        product['old_price'] = response.xpath(
            'normalize-space(//*[@id="price"]/table/tbody/tr[1]/td[2]/span[1]/text())').get()
        product['price'] = response.xpath(
            'normalize-space(//*[@id="dealsAccordionRow"]/div/div[1]/a/h5/div[1]/span[2]/text())').get()
        product['saving'] = response.xpath(
            'normalize-space(//*[@id="dealprice_savings"]/td[2]/text())').get()
        yield product

0 个答案:

没有答案