Https用scrapy形成javascript形式的帖子

时间:2015-12-13 19:37:09

标签: javascript python html scrapy

我正在使用scrapy,我想将参数发布到以下https登录表单:

<form id="loginForm" name="loginForm" action="/ax/login/loginNN.html" onsubmit="loginWidget.logIn(); loginWidget=null;" onkeypress="checkForReturn(this, event)" method="post">
    <input type="hidden" name="referer" value="/ax/web/nn/index.html" />
    <input type="hidden" name="encryption" value="1" />

    <div class="fakePasswordContainer">
                <input class="js-clear_field" tabindex="-1" name="fake_password" type="password">
    </div>

    <div class="input-group input-group-custom">
        <span class="input-group-addon"><span class="icon icon_user"></span></span>
        <input type="text" class="form-control lowercase" placeholder="Username" id="input1" name="8d144c359a21a05e83e9a1b56ec6a8e7" type="text" autocomplete="off" >
    </div>

    <div class="input-group input-group-custom">
        <span class="input-group-addon"><span class="icon icon_key"></span></span>
        <input id="fakeholder" type="text" class="form-control" placeholder="Password" autocomplete="off" style="display:none;">
        <input id="pContent" type="password" class="form-control" placeholder="Password" autocomplete="off">
        <input id="pContHidden" name="420d27b4073e303b678e19767daa0f38" type="hidden" autocomplete="off" />
    </div>

    <div class="row multiple-button-container">

            <div class="col-sm-7 align-left">
                <p class="btn-inline"><a href="NewPsw.html?a3=NNSE&a4=sv&usePhrase=0">Password forgotten?</a></p>
            </div>

        <div class="col-sm-5">
            <button id="login_btn" type="button" class="btn btn-primary btn-custom btn-block cta" onclick="if (loginWidget != null) return loginWidget.logIn()">Log in</button>
        </div>
    </div>
</form>

我没有这样做,这是我的蜘蛛解析器:

def parse(self, response):
    sel = Selector(response)
    login_parameters = sel.xpath("//div/div/div/div/div/div/form[@id='loginForm']/div")
    user_param = ""
    pass_param = ""
    for parameter in login_parameters:
        param1 = parameter.xpath('input[@id="input1"]/@name').extract()
        if param1:
            user_param = param1[0]
        param2 = parameter.xpath('input[@id="pContHidden"]/@name').extract()
        if param2:
            pass_param = param2[0]
    form_data = {u'referer':u'/ax/login/startSE.html?cmpi=start-login',u'encryption': u'1',u'fake_password':,user_param:u'123456',,pass_param : u'Abcdefg'}
    url = u'https://www.ordnet.se/ax/login/loginNN.html'
    print form_data
    yield FormRequest(url, callback=self.parse2, formdata=form_data)

我是否遗漏了帖子中的任何参数,或者我做错了什么?任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

如果你用硒做它会更好。为了防止虚假请求和机器人通常有隐藏元素与令牌和CSRF字符串,你不能轻易伪造。 使用selenium,您可以坐在驾驶员座位上并控制脚本的执行或触发事件。