python urlopen 403尽管可以通过浏览器访问URL

时间:2016-07-03 17:10:25

标签: python urlopen

..嗨大家..以下网址https://www.nseindia.com/products/dynaContent/equities/indices/historicalindices.jsp?toDate=30-06-2016&fromDate=29-06-2016&indexType=NIFTY%2050在通过浏览器访问时工作得非常好但是我的python代码一直在抛出403.我困住了错误消息,其中显示"访问被拒绝&#34 ;但有趣的是这个解释是 - 你没有权限访问" http://www.nseindia.com/products/dynaContent/equities/indices/historicalindices.jsp"在这台服务器上

任何指针都将非常感谢!

下面插入的代码(INDEX文件的内容只有2行" NIFTY 50"" NIFTY MIDCAP 50")

    from urllib import urlencode
    import urllib2
    from bs4 import BeautifulSoup
    from datetime import datetime
    import csv
    import time
    import datetime

    arr = [1,3,5,10]
    url = "http://www.nseindia.com/products/dynaContent/equities/indices/historicalindices.jsp"
    fo = open("options/multiyearreturn/INDEX_DATA.txt", "wb")

    def is_number(s):
        try:
            float(s)
            return True
        except ValueError:
            return False

    with open('options/multiyearreturn/INDEX', 'rb') as csvfile:
        spamreader = csv.reader(csvfile, delimiter=',')
        for row in spamreader:
            print row
            # cant take today since it will load with a days lag
            ToDt = datetime.datetime.now() - datetime.timedelta(days=1)
            if datetime.datetime.now().weekday() == 5:
                ToDt = ToDt - datetime.timedelta(days=1)
            elif datetime.datetime.now().weekday() == 6:
                ToDt = ToDt - datetime.timedelta(days=2)

            for x in range(len(arr)):
                frmDt1 = ToDt - datetime.timedelta(days=1)
                if frmDt1.weekday() == 5:
                     frmDt1 = frmDt1 -  datetime.timedelta(days=1)
                elif frmDt1.weekday() == 6:
                     frmDt1 = frmDt1 -  datetime.timedelta(days=2)

                values = {'indexType' : row[0], 'fromDate' : frmDt1.strftime("%d-%m-%Y"), 'toDate' : ToDt.strftime("%d-%m-%Y"), "User-Agent" : "Magic Browser" }
                data = urlencode(values).replace('+','%20')
                req = urllib2.Request(url, data)
                print data
                try:
                    response = urllib2.urlopen(req)
                except urllib2.HTTPError, e:
                    print e.fp.read()
                the_page = response.read()
                soup = BeautifulSoup( the_page )
                MFDTLS = soup.findAll('td', {'class': 'number'})

1 个答案:

答案 0 :(得分:1)

刚刚在标题中添加了这一额外的行

                hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
               'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'}

“Accept”标题为我做了诀窍。感谢帮助!