我试图通过urllib.request使用网页报废下载过去15年的NSE EoD数据的bhavcopy。
我看到urllib.request表现得很奇怪,它在一个案例中有效,但在另一个案例中它让我错误403访问拒绝..
我使用HTTP标头进行屏蔽,但在一种情况下它失败了..
这是代码
import urllib.request
def downloadCMCSV(year="2001",mon="JAN",dd="01"):
#baseurl = "https://www.nseindia.com"
headers = {'Host':'www.nseindia.com:443',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, sdch, br',
'Accept-Language':'en-US,en;q=0.8',
#'Cookie':'NSE-TEST-1=1809850378.20480.0000; pointer=1; sym1=ONGC; pointerfo=1; underlying1=ONGC; instrument1=FUTSTK; optiontype1=-; expiry1=27OCT2016; strikeprice1=-',
'Cookie':'NSE-TEST-1=1809850378.20480.0000; pointer=1; sym1=ONGC; pointerfo=1; underlying1=ONGC; instrument1=FUTSTK; optiontype1=-; expiry1=27OCT2016; strikeprice1=-; JSESSIONID=B4CA0543FF4C33FD9EA9D18B95238DB4',
'Referer':'Referer: https://www.nseindia.com/products/content/equities/equities/archieve_eq.htm',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
filename = "cm%s%s%sbhav.csv" % (dd,mon,year)
urlcm = "https://www.nseindia.com/content/historical/EQUITIES/%s/%s/%s.zip" % (year, mon, filename)
print(urlcm)
request = urllib.request.Request(urlcm, headers = headers)
#print(dir(request))
#print(request.headers)
try:
response = urllib.request.urlopen(request)
except urllib.error.HTTPError as e:
if e.code == 404:
print("Bhavcopy not available for", year, mon, dd)
return
print(e.code)
print(e.read())
return
if response.code == 200:
print("The response is good", response.length)
if __name__ == "__main__":
#getAll()
downloadCMCSV('2001','JAN', '01')
downloadCMCSV('2016','JAN', '01')
输出如下
https://www.nseindia.com/content/historical/EQUITIES/2001/JAN/cm01JAN2001bhav.csv.zip
403
b'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http://www.nseindia.com/content/historical/EQUITIES/2001/JAN/cm01JAN2001bhav.csv.zip" on this server.<P>\nReference #18.33210f17.1476700779.13b4f615\n</BODY>\n</HTML>\n'
https://www.nseindia.com/content/historical/EQUITIES/2016/JAN/cm01JAN2016bhav.csv.zip
The response is good 58943
你能帮我解决一下我的错误吗?
答案 0 :(得分:0)
传递用户代理,'Accept': '*/*'
和 referer 标头:
url = "https://www.nseindia.com/content/historical/EQUITIES/2001/JAN/cm01JAN2001bhav.csv.zip"
r = request.Request(url, headers={'User-Agent': 'mybot', 'Accept': '*/*',
"Referer": "https://www.nseindia.com/products/content/equities/equities/archieve_eq.htm"})
print(request.urlopen(r))
您不需要cookie或任何其他设置。