Question

我必须在scrapy中模拟表单提交以生成页面。

以下是表格（我剪掉一小部分）

<form id="" accept-charset="utf-8" method="POST" action="#">
<fieldset>
<div class="select-style">
<select id="study-select" name="">
<option>Choose an area of study</option>
<option data-tag="a1">Anthropology</option>
<option data-tag="a2">Architecture</option>
<option data-tag="b1">Biology</option>
<option data-tag="b2">Botany</option>
...
</select>
</div>
</fieldset>
</form>

我在scrapy中编写以下代码。我的表单xpath是正确的。我正在测试scrapy shell中的代码。

resfrom = scrapy.FormRequest.from_response(response, 
formxpath='//div[@id="field_switcher"]//form', 
formdata={'study-select':'Biology'}, 
clickdata={'type':'submit'}, method= 'POST')

但这不起作用。我只是不能发布＆＃34;发布＆＃34;这个。随后写resfrom.body'只给'study-select=Biology'。如何＆＃34;发布＆＃34;数据到scrapy中id为id的字段？我尝试过很多选择，但似乎没什么用。您在我的代码中看到了什么问题。

Answer 1

在您的情况下，没有要提交的表单。数据已存在于HTML中。

以下是按国家/地区对商店位置进行分组的示例代码：

$ scrapy shell http://www.apple.com/retail/storelist/
>>> from pprint import pprint
>>>
>>> data = {}
>>> for country in response.css(".section-country-stores .listing"):
...     country_id = country.xpath("@id").extract_first().replace("stores", "")
...     data[country_id] = [" ".join(map(unicode.strip, place.xpath(".//li//text()").extract())) for place in country.css("ul")]
... 
>>> pprint(data)
{u'ae': [u'Abu Dhabi, Yas Mall Yas Mall Yas Island Abu Dhabi 800 04441824',
         u'Dubai, Mall of the Emirates Mall of the Emirates Al Barsha 1 Dubai 800 04441819'],
 u'au': [u'Canberra Canberra Centre Canberra ACT 2601 (02) 6224 9500',
         u'Bondi 213 Oxford Street Bondi Junction NSW 2022 (02) 9019 2400',
         ...
         ],
 ...
}

提交表格提交与HTML选择不起作用

1 个答案: