Scrapy FormRequest登录无效

时间:2017-04-17 19:02:16

标签: web-scraping scrapy

我正在尝试使用Scrapy登录,但我收到了大量“重定向(302)”消息。当我使用我的真实登录以及假登录信息时会发生这种情况。我也尝试过另一个网站但仍然没有运气。

import scrapy
from scrapy.http import FormRequest, Request

class LoginSpider(scrapy.Spider):
    name = 'SOlogin'
    allowed_domains = ['stackoverflow.com']

    login_url = 'https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f'
    test_url = 'http://stackoverflow.com/questions/ask'

    def start_requests(self):
        yield Request(url=self.login_url, callback=self.parse_login)

    def parse_login(self, response):
        return FormRequest.from_response(response, formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)

    def start_crawl(self, response):
       yield Request(self.test_url, callback=self.parse_item)

    def parse_item(self, response):
        print("Test URL " + response.url)

我也尝试添加

meta = {'dont_redirect': True, 'handle_httpstatus_list':[302]} 

到初始Request和FormRequest。

以下是上述代码的输出:

  

2017-04-17 21:48:17 [scrapy.utils.log]信息:Scrapy 1.3.3开始了   (bot:stackoverflow)2017-04-17 21:48:17 [scrapy.utils.log]信息:   重写设置:{'BOT_NAME':'stackoverflow','NEWSPIDER_MODULE':   'stackoverflow.spiders','SPIDER_MODULES':['stackoverflow.spiders'],   'USER_AGENT':'Mozilla / 5.0'} 2017-04-17 21:48:17 [scrapy.middleware]   信息:启用扩展:['scrapy.extensions.corestats.CoreStats',   'scrapy.extensions.telnet.TelnetConsole',   'scrapy.extensions.logstats.LogStats'] 2017-04-17 21:48:17   [scrapy.middleware]信息:启用下载中间件:   [ 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',   'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',   'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',   'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',   'scrapy.downloadermiddlewares.retry.RetryMiddleware',   'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',   'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',   'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',   'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',   'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-04-17   21:48:17 [scrapy.middleware]信息:启用蜘蛛中间件:   [ 'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',   'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',   'scrapy.spidermiddlewares.referer.RefererMiddleware',   'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',   'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-04-17 21:48:17   [scrapy.middleware]信息:启用项目管道:[] 2017-04-17   21:48:17 [scrapy.core.engine]信息:蜘蛛打开2017-04-17 21:48:17   [scrapy.extensions.logstats]信息:抓取0页(0页/分钟),   刮0件(0件/分)2017-04-17 21:48:17   [scrapy.extensions.telnet] DEBUG:正在监听的Telnet控制台   127.0.0.1:6023 2017-04-17 21:48:18 [scrapy.core.engine] DEBUG:Crawled(200)https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a% 2F%2fstackoverflow.com%2F>   (引用者:无)2017-04-17 21:48:18 [scrapy.core.engine] DEBUG:   已抓取(200)https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX>   (引荐:   https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f)   2017-04-17 21:48:19 [scrapy.downloadermiddlewares.redirect] DEBUG:   重定向(302)到http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>   来自http://stackoverflow.com/questions/ask> 2017-04-17 21:48:19   [scrapy.downloadermiddlewares.redirect] DEBUG:重定向(302)到   https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>   来自http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>   2017-04-17 21:48:19 [scrapy.core.engine] DEBUG:Crawled(200)https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com %2fquestions%2fask>   (引荐:   https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX)测试   网址   https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask   2017-04-17 21:48:19 [scrapy.core.engine]信息:关闭蜘蛛   (已完成)2017-04-17 21:48:19 [scrapy.statscollectors]信息:倾倒   Scrapy stats:{'downloader / request_bytes':1772,   'downloader / request_count':5,'downloader / request_method_count / GET':   5,'downloader / response_bytes':34543,'downloader / response_count':   5,'downloader / response_status_count / 200':3,   'downloader / response_status_count / 302':2,'finish_reason':   'finished','finish_time':datetime.datetime(2017年,4,17,18,48,19,   470354),'log_count / DEBUG':6,'log_count / INFO':7,   'request_depth_max':2,'response_received_count':3,   'scheduler / dequeued':5,'scheduler / dequeued / memory':5,   'scheduler / enqueued':5,'scheduler / enqueued / memory':5,   'start_time':datetime.datetime(2017,4,17,18,48,17,386516)}   2017-04-17 21:48:19 [scrapy.core.engine]信息:蜘蛛关闭   (成品)

1 个答案:

答案 0 :(得分:0)

默认情况下,Scrapy会尝试在第一个可单击的输入字段中填充您的电子邮件和密码(在登录页面的搜索表单中)。您需要按formnameformid指定输入字段,例如 FormRequest.from_response(response, formid="login-form", formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)See docs