请求未调用Scrapy回调函数

时间:2018-10-31 00:54:32

标签: linux python-2.7 web-scraping web-crawler

我正在学习Python Scrapy,并且正在努力找出为什么未执行请求的回调函数。工作流程过程会刮擦一个网站。如果找到项目页面,则程序将测试以查看该网页是否具有活动的登录会话。如果不是,则调用一个函数进行登录。遇到的问题是会话会随着时间过期,并且我的功能需要在一段时间后登录。任何帮助或指导将不胜感激。

我要调用的函数如下:

    def retrylogin_parse(self, response):
            #FUNCTION NOT CALLED
                self.logger.debug("Re-Login attempted for url " + self.login_url)
                return [FormRequest.from_response(response, formid= 'login-form', formdata=
   {'login[username]': self.username, 
    'login[password]': self.password}, 
    clickdata = { "type": "submit" }, callback=self.after_relogin)
    ] 

我试图确定为什么以下代码行未调用以下函数retrylogin_parse:

yield Request(self.login_url, dont_filter=True, callback=self.retrylogin_parse)

以下是代码:

import ....

class MySpider(CrawlSpider):
    name = "bot-help"
    allowed_domains = ['www.somewebsite.com']
    start_urls =["https://www.somewebsite.com/category/subcategory.html"]
    reloginCurrentUrl = ""  

    username = 'username'
    password = '1234'
    login_msg = "Welcome"
        login_url = "https://www.somewebsite.com/login/"

    rules = (
        Rule(LinkExtractor(allow=('html')), callback='item_page'),
    )

    def item_page(self, response):
        image_item = Item()
        self.logger.info("item_page Called")
        str1 = response.xpath("//p[@class='welcome-msg']/text()").extract_first()

        self.logger.info("Testing if Response is still logged in")
        self.logger.debug("Message: " + str1)

        if (str1.find(self.login_msg)==-1):
            self.logger.error("Session Lost! Must Login")
            self.reloginCurrentUrl = response.url
            image_item['manu_product_url'] = response.url
            self.logger.debug("reloginCurrentUrl: " + self.reloginCurrentUrl)
            #HERE IS WHERE I WANT TO RE-LOGIN
            x = self.start_relogin(response)    
            self.logger.debug("Relogin Request completed")
            return
        else: 
            self.logger.info("Login Session is alive")


        self.logger.info("worked") 
        #SCRAPE DATA.....

        yield image_item

    def __init__(self, **kwargs):
        CrawlSpider.__init__(self, **kwargs)

    def start_relogin(self, response):
        self.logger.debug("start_relogin function called")
        x = response.url
        self.logger.debug("Login Url: " + self.login_url)
        yield Request(self.login_url, dont_filter=True, callback=self.retrylogin_parse)

    def retrylogin_parse(self, response):
    #FUNCTION NOT CALLED
        self.logger.debug("Re-Login attempted for url " + self.login_url)
        return [FormRequest.from_response(response, formid= 'login-form', formdata={'login[username]': self.username, 'login[password]': self.password}, clickdata = { "type": "submit" }, callback=self.after_relogin)]  

    def after_relogin(self, response):
        self.logger.info("Post Re-Login Attempted")
        str1 = response.xpath("//p[@class='welcome-msg']/text()").extract_first()

        if (str1.find(self.login_msg) == -1):
            self.logger.info("Re-Login failed")
                return
        else:
            self.logger.info("Re-Login successful will conitnue to parse")
            return [Request(url=self.reloginCurrentUrl)]

这是调试输出:

DEBUG: Crawled (200) <GET https://www.somewebsite.com/robots.txt> (referer: None)
2018-10-30 23:49:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.somewebsite.com/category/subcategory.html> (referer: None)
2018-10-30 23:49:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.somewebsite.com/category/abc-46467.html> (referer: https://www.somewebsite.com/category/subcategory.html)
2018-10-30 23:49:35 [foagroupbot-help] INFO: item_page Called
2018-10-30 23:49:35 [foagroupbot-help] INFO: Testing if Response is still logged in
2018-10-30 23:49:35 [foagroupbot-help] DEBUG: Message: Login Please!
2018-10-30 23:49:35 [foagroupbot-help] ERROR: Session Lost! Must Login
2018-10-30 23:49:35 [foagroupbot-help] DEBUG: reloginCurrentUrl: https://www.somewebsite.com/category/abc-46467.html
2018-10-30 23:49:35 [foagroupbot-help] DEBUG: Relogin Request completed

0 个答案:

没有答案