Question

我想下载最高法院案件。下面是代码，我正在尝试：

page = requests.get('http://judis.nic.in/supremecourt/Chrseq.aspx').text

我在页面中收到以下内容：

u'<html><p><hr></hr></p><b><center>The Problem may be due to 500 Server Error/404 Page Not Found.Please contact your system administrator.</center></b><p><hr></hr></p></html><!--0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234-->\r\n'

该网站是否不可报废或我是否需要使用其他方法？

我检查了这个答案：How to scrape aspx pages with python，但解决方案是在硒中。是否有可能在蟒蛇和美丽的汤中做到这一点？

Answer 1

原因是您正在点击服务器可能不再提供的URL。我可以从所有页面获取数据。我检查了来自scrapy shell的响应

scrapy shell "http://judis.nic.in/supremecourt/chejudis.asp"

并使用xpath，您可以从同一页面检索您想要的任何数据。

Answer 2

我无法通过浏览器打开网站。我从浏览器中得到了相同的响应。也许这就是为什么你要回复这个回复的原因。

用python

2 个答案: