Question

这是我要抓的网页： http://www.nalpdirectory.com/Page.cfm?PageID=34。我想模拟提交表单#resultDisplayOptionsForm并将#customDisplayNum设置为All，这将为我带来一个包含所有列出项目的网页。

这是我的代码段：

def parse(self, response):
    yield scrapy.FormRequest.from_response(
        response,
        formid='resultDisplayOptionsForm',
        formdata={'displayNum': '100000'}, #I tried 10, 20, 30 etc. none works
        dont_click=True,
        #clickdata={'id': 'customizeDisplaySubmitBtn'},
        callback=self.after_showAll
    )
def after_showAll(self, response):
    from scrapy.shell import inspect_response
    inspect_response(response, self)

当我检查响应时，它总是显示一个失败的页面。欢迎任何建议。谢谢！

Answer 1

这里的问题是您错过了获取数据的实际POST请求。

如果您仔细检查，表单的def parse(self, response): yield FormRequest.from_response( response, formid='resultDisplayOptionsForm', formdata={'displayNum': '100000000'}, # I tried 10, 20, 30 etc. none works dont_click=True, # clickdata={'id': 'customizeDisplaySubmitBtn'}, callback=self.after_showAll ) def after_showAll(self, response): yield FormRequest( url='http://www.nalpdirectory.com/Page.cfm?PageID=34', formdata={ 'currPage': '1', 'checkedFormID': '', }, callback=self.parse_real, ) def parse_real(self, response): from scrapy.shell import inspect_response inspect_response(response, self)请求网址为this site，而您想要的“响应”为this site，所以你可以确认缺少某些东西。

你缺少在最终网站上执行第三个请求，在scrapy代码中，它会是这样的：

return table

scrapy不能提交表格

1 个答案: