Question

我需要一些帮助来获取scrapy从asp.net网站下载文件。通常从浏览器中点击链接并且文件将开始下载，但是scrapy不可能这样做，所以我想做的是以下内容：

def retrieve(self, response):
        print('Response URL: {}'.format(response.url))

        pattern = re.compile('(dg[^\']*)')

        for file in response.xpath('//table[@id="dgFile"]/tbody/tr/td[2]/a'):
            file_url = file.xpath('@href').extract_first()

            target = re.search(pattern, file_url).group(1)
            viewstate = response.xpath('//*[@id="__VIEWSTATE"]/@value').extract_first()
            viewstategenerator = response.xpath('//*[@id="__VIEWSTATEGENERATOR"]').extract_first()
            eventvalidation = response.xpath('//*[@id="__EVENTVALIDATION"]').extract_first()

            data = {
                '_EVENTTARGET': target,
                '_VIEWSTATE': viewstate,
                '_VIEWSTATEGEERATOR': viewstategenerator,
                '_EVENTVALIDATION': eventvalidation
            }

            yield FormRequest.from_response(
                response,
                formdata=data,
                callback=self.end(response)
            )

我正在尝试将信息提交到页面，以便接收zip文件作为响应，但是这不起作用，因为我希望它会。相反，我只是将同一页面作为回复。

在这种情况下甚至可以使用scrapy下载此文件？有没有人有任何指示？

我也尝试过使用Selenium + PhantomJS，但是我试图将会话从scrapy转移到selenium。我愿意使用selenium来实现这个功能，但是我需要在这个项目中使用scrapy。

Scrapy从ASP.NET站点下载zip文件

0 个答案: