Question

过去两天我一直在努力解决这个问题。我需要从this网站上抓取所有＆＃34;干部＆＃34;或类别。不幸的是，该网站允许通过下拉菜单访问这些数据＆＃34;选择干部＆＃34;它没有＆＃34;所有类别＆＃34;选项。为了避免这种情况，我正在使用Scrapy的FormRequest.from_response方法，但蜘蛛正在返回一个没有数据的空白文件。任何帮助表示赞赏。这是代码：

H2 JDBC

Answer 1

当我运行你的代码时，我在日志中看到了这一点：

2017-08-19 15:52:20 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'civillist.ias.nic.in': <POST http://civillist.ias.nic.in/UpdateCL/DraftCL.asp>

只需将allowed_domains更改为：

allowed_domains = ['civillist.ias.nic.in']

它有效。

使用Scrapy的FormRequest.from_response方法可以自动抓取下拉菜单明智的数据

1 个答案: