我想使用scrapy python单击网站链接

时间:2018-09-26 12:33:58

标签: python selenium selenium-webdriver web-scraping scrapy

class FilteredListSerializer(serializers.ListSerializer):

    filter_kwargs = {}

    def to_representation(self, data):
        if not self.filter_kwargs or not isinstance(self.filter_kwargs, dict):
            raise TypeError(_('Invalid Attribute Type: `filter_kwargs` must be a of type `dict`.'))
        data = data.filter(**self.filter_kwargs)
        return super().to_representation(data)
  

我要单击链接

 class IsActiveListSerializer(FilteredListSerializer):
     filter_kwargs = {'is_active': 1}
  

我要点击链接

import scrapy
from selenium import webdriver


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://ozhat-turkiye.com/en/brands/a',
    ]

1 个答案:

答案 0 :(得分:0)

以下脚本应将所有链接到下一页链接的点击耗尽,以获取所需的项目。您不能在这里response.follow()上使用,因为除了单击它以外,没有其他链接可用于跟踪。

import time
import scrapy
from selenium import webdriver

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://ozhat-turkiye.com/en/brands/a',
    ]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        while True:
            time.sleep(5)
            for title in self.driver.find_elements_by_css_selector('div.tabledivinlineblock a.tablelink50'):
                yield {'title': title.text,'response': response.url}

            try:
                self.driver.find_element_by_css_selector('span#maincontent_DataPager a:last-child').click()
            except Exception: break

我在脚本中使用了编码等待,根本不建议这样做。您应将其替换为Explicit Wait