尝试使用Scrapy FormRequest填写表单,意外结果

时间:2015-09-01 12:09:21

标签: python-2.7 scrapy scrapy-spider

我正在尝试填写www.wetseal.com/Stores上的表格,该表格允许选择显示商店的状态。

<form action="http://www.wetseal.com/Stores?dwcont=C73689620" method="post" id="dwfrm_storelocator_state">
                    <fieldset>



                        <div class="form-row required ">            
                            <label for="dwfrm_storelocator_address_states_stateUSCA">               
                                <span>State</span>              
                                <span class="required-indicator">*</span>

                            </label>
                            <select id="dwfrm_storelocator_address_states_stateUSCA" class="input-select required" name="dwfrm_storelocator_address_states_stateUSCA">              
                                <option value="">Select...</option>


                                    <option value="AK">Alaska</option>


                                    <option value="AZ">Arizona</option>


                                    <option value="AR">Arkansas</option>


                                    <option value="CA">California</option>


                                    <option value="CO">Colorado</option>


                                    <option value="CT">Connecticut</option>


                                    <option value="DE">Delaware</option>


                                    <option value="FL">Florida</option>


                                    <option value="GA">Georgia</option>


                                    <option value="HI">Hawaii</option>


                                    <option value="ID">Idaho</option>


                                    <option value="IL">Illinois</option>


                                    <option value="IN">Indiana</option>


                                    <option value="KS">Kansas</option>


                                    <option value="KY">Kentucky</option>


                                    <option value="MD">Maryland</option>


                                    <option value="MA">Massachusetts</option>


                                    <option value="MI">Michigan</option>


                                    <option value="MN">Minnesota</option>


                                    <option value="MS">Mississippi</option>


                                    <option value="MO">Missouri</option>


                                    <option value="NE">Nebraska</option>


                                    <option value="NV">Nevada</option>


                                    <option value="NH">New Hampshire</option>


                                    <option value="NJ">New Jersey</option>


                                    <option value="NM">New Mexico</option>


                                    <option value="NY">New York</option>


                                    <option value="NC">North Carolina</option>


                                    <option value="ND">North Dakota</option>


                                    <option value="OH">Ohio</option>


                                    <option value="OK">Oklahoma</option>


                                    <option value="OR">Oregon</option>


                                    <option value="PA">Pennsylvania</option>


                                    <option value="PR">Puerto Rico</option>


                                    <option value="RI">Rhode Island</option>


                                    <option value="SC">South Carolina</option>


                                    <option value="SD">South Dakota</option>


                                    <option value="TN">Tennessee</option>


                                    <option value="TX">Texas</option>


                                    <option value="VA">Virginia</option>


                                    <option value="WA">Washington</option>


                                    <option value="WV">West Virginia</option>


                                    <option value="WI">Wisconsin</option>

                            </select>
                        </div>                          
                        <button type="submit" name="dwfrm_storelocator_findbystate" value="Search">
                            Search
                        </button>
                    </fieldset>
                </form>

使用Chrome浏览器,我可以看到正在发出的请求和表单参数:

enter image description here

那就是说,我有一个非常简单的蜘蛛,看着文档,发送一个FormRequest到该URL填写表格(在这种情况下,我正在测试亚利桑那州的商店 - AZ):

class WetSealStoreSpider(Spider):
    name = "wetseal_store_spider"
    allowed_domains = ["wetseal.com"]
    start_urls = [
        "http://www.wetseal.com/Stores"
    ]

    def parse(self, response):
        yield FormRequest.from_response(response,
                                    formname='dwfrm_storelocator_state',
                                    formdata={'dwfrm_storelocator_address_states_stateUSCA': 'AZ',
                                              'dwfrm_storelocator_findbystate': 'Search'},
                                    callback=self.parse1)

    def parse1(self, response):
        print response.status
        print response.body

当它进入FormRequest时,查看响应,一切似乎都没问题:

enter image description here

但是在回调方法中,我在响应中看到了这一点:

enter image description here

最后看起来像是一个GET请求,而且网址都错了:

'http://www.wetseal.com/Search?q=&dwfrm_storelocator_findbystate=Search&dwfrm_storelocator_address_states_stateUSCA=AZ'

知道我做错了什么吗?

谢谢!

2 个答案:

答案 0 :(得分:1)

您正在使用formname但该表单没有名称。

请尝试使用formxpath='id("dwfrm_storelocator_state")'

答案 1 :(得分:0)

试试这个

states = response.xpath(
        ".//select[@id='dwfrm_storelocator_address_states_stateUSCA']//option[@value!='']/@value").extract()
    url = self.get_text_from_node(response.xpath("//form[@id='dwfrm_storelocator_state']/@action"))
    for state in states:
        form_data = {'dwfrm_storelocator_address_states_stateUSCA': state,
                     "dwfrm_storelocator_findbystate": "Search"}
        yield FormRequest(url,
                          formdata=form_data,
                          callback=self.your_Callback)