Question

I have just started integrating Django with Scrapy.

Upon receiving the variable(that is Website url) on the Django side, I want to pass it to the scrapy part so as to crawl it.

This is the code snippet I wrote on the backend.

    def post(self, request, format=None):
        ...
        serializer = self.serializer_class(data=data)

        if serializer.is_valid():
            site = serializer.create(data)
            domain = urlparse(site.url).netloc
            site_id = site.id

            settings = {
                'unique_id': unique_id,
                'USER_AGENT': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
            }
            task = scrapyd.schedule('default', 'icrawler', settings=settings, url=url, domain=domain)
            task = {
                'task_id': task,
                'unique_id': target_id,
                'status': 'started'
            }
            resp = {
                'task': task,
                'data': serializer.data,
                'status': status.HTTP_201_CREATED
            }
            return Response(resp)
        else:
            return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)

I have created spider on the Django project called icrawler.py.

class IcrawlerSpider(CrawlSpider):
    name = 'icrawler'
    allowed_domains = ['https://google.com']
    start_urls = ['http://https://google.com/']

    rules = (
        Rule(LinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        i = {}
        return i

As you can see, there is allowed_domains = ['https://google.com'] and start_urls = ['http://https://google.com/'] in the spider.

But I want to replace it with variable passed from Django and start running crawler, upon receiving the variable on the django side.

I am not sure how I can implement it.

How to pass variable from Django to Scrapy?

0 个答案: