使用Scrapy登录Quora

时间:2016-04-24 12:19:48

标签: cookies login web scrapy quora

我尝试使用Scrapy登录Quora,但是我没有成功,表示400或500代码,对应于我的formdata。

我通过Chrome找到了表单数据:

General
Request URL:https://www.quora.com/webnode2/server_call_POST?__instart__
Request Method:POST
Status Code:200
Remote Address:103.243.14.60:443

Form Data
json:{"args":[],"kwargs":{"email":"1liusai253@163.com","password":"XXXX","passwordless":1}}
formkey:750febacf08976a47c82f3e10af83305
postkey:dab46d0df2014d1568ead6b2fbad7297
window_id:dep3300-2420196009402604566
referring_controller:index
referring_action:index
_lm_transaction_id:0.2598935768985011
_lm_window_id:dep3300-2420196009402604566
__vcon_json:["Vn03YsuKFZvHV9"]
__vcon_method:do_login
__e2e_action_id:ee1qmp1iit
js_init:{}

接下来是我的代码示例,一个正常的Scrapy流程。我认为问题出在formdata上。有人可以帮忙吗?

import scrapy
import re

class QuestionsSpider(scrapy.Spider):
    name = 'questions'
    domain = 'https://www.quora.com'
    headers = {
            "Accept": "application/json, text/javascript, */*; q=0.01",
            "Accept-Language": "zh-Hans-CN,zh-Hans;q=0.8,en-US;q=0.5,en;q=0.3",
            "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Host": "www.quora.com",
            "Connection": "Keep-Alive",
            "content-type":"application/x-www-form-urlencoded"
        }

def __init__(self, login_url = None):
    self.login_url = 'https://www.quora.com/webnode2/server_call_POST?__instart__' # Here is the login URL of Quora

def start_requests(self):
    body = response.body
    formkey_patt = re.compile(r'.*?"formkey".*?"(.*?)".*?',re.S)
    formkey = re.findall(formkey_patt, body)[0]
    postkey_patt = re.compile('.*?"postkey".*?"(.*?)".*?',re.S)
    postkey = re.findall(postkey_patt, body)[0]
    window_id_patt = re.compile('.*?window_id.*?"(.*?)".*?',re.S)
    window_id = re.findall(window_id_patt, body)[0]

    referring_controller = 'index'
    referring_action = 'index'
    __vcon_method = 'do_login'

    yield scrapy.Request(
        url = self.domain,
        headers = self.headers,
        meta = {'cookiejar':1},
        callback = self.start_login
        )

def start_login(self,response):
    yield scrapy.FormRequest.from_response(
            response,
            url = self.login_url,
            meta = {'cookiejar':response.meta['cookiejar']},
            headers = self.headers,
            formdata = {"json":{"args":[],"kwargs":{"email":"xxxx","password":"xxx"}},
            "formkey":formkey,
            "postkey":postkey,
            "window_id":window_id,
            "referring_controller":referring_controller,
            "referring_action":referring_action,
            "__vcon_method":__vcon_method,
            "__e2e_action_id":"ee1qmp1iit"
            },
            callback = self.after_login
        )

def after_login(self, response):
    print response.body

1 个答案:

答案 0 :(得分:0)

您没有设置或发送 formkey,postkey,window_id,等。这就是您应该从响应中获取它们的原因。话虽如此,您需要使用FormRequest.from_response()