脚本从雅虎财务下载股票价格数据,随机404s

时间:2015-02-09 10:23:29

标签: python csv download http-status-code-404 yahoo-finance

以下脚本读取公司的股票代码符号.txt,以便在.csv中下载相应的财务信息。数据从Yahoo Finance中提取并保存在本地目录中。

import urllib.request
import requests
import time

#Define the URL to download the .csv from.
url_begin = "http://real-chart.finance.yahoo.com/table.csv?s="
url_end = "&a=00&b=1&c=1950&d=11&e=31&f=2050&g=d&ignore=.csv"


#Function that reads all available ticker symbols from ticker_daten.txt. This file should be in the same directory as the program.
def readTickers(file):
    read_ticker = []
    ins = open( file, "r" )
    for line in ins:
        if line.endswith('\n'):
            line=line[:-1]
        read_ticker.append(line)
    ins.close()
    return read_ticker

#File location for tickersymbols to download
tickers = readTickers("C:/Users/Win7ADM/Desktop/Jonas/stock-price-leecher/ticker_daten.txt")

#Loop through list of ticker symbols and download .csv's .
for i in tickers:

    #Forge downloadable link.
    link_created = url_begin + i + url_end

    #Make sure that the link actually leads to a file.
    try:
        r = requests.head(link_created)
        if r.status_code==404:
            print(str(r.status_code)+": No page found!")
            time.sleep(0.5)
        else:
            print(link_created)

            #Finally download the file, if it does exist.
            urllib.request.urlretrieve(link_created, "C:/Users/Win7ADM/Desktop/Jonas/stock-price-leecher/data/"+i+".csv")
            time.sleep(0.5)
    except requests.ConnectionError:
        #A Connection error occurred.
        print ("ConnectionError: 404 No page found!")
    except requests.HTTPError:
        #An HTTP error occurred.
        print ("HTTPError!")
    except requests.Timeout:
        #Connection timed out.
        print ("Timeout!")

问题:在20-1750 .csv之间加载后,脚本会随机崩溃。崩溃产生以下输出。

Process started >>>
http://real-chart.finance.yahoo.com/table.csv?s=0055.HK&a=00&b=1&c=1950&d=11&e=31&f=2050&g=d&ignore=.csv
http://real-chart.finance.yahoo.com/table.csv?s=0056.HK&a=00&b=1&c=1950&d=11&e=31&f=2050&g=d&ignore=.csv
http://real-chart.finance.yahoo.com/table.csv?s=0057.HK&a=00&b=1&c=1950&d=11&e=31&f=2050&g=d&ignore=.csv
http://real-chart.finance.yahoo.com/table.csv?s=0058.HK&a=00&b=1&c=1950&d=11&e=31&f=2050&g=d&ignore=.csv
Traceback (most recent call last):
  File "Stock-Price Leecher.py", line 40, in <module>
    urllib.request.urlretrieve(link_created, "C:/Users/Win7ADM/Desktop/Jonas/stock-price-leecher/data/"+i+".csv")
  File "c:\Python34\lib\urllib\request.py", line 178, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "c:\Python34\lib\urllib\request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "c:\Python34\lib\urllib\request.py", line 461, in open
    response = meth(req, response)
  File "c:\Python34\lib\urllib\request.py", line 571, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\Python34\lib\urllib\request.py", line 499, in error
    return self._call_chain(*args)
  File "c:\Python34\lib\urllib\request.py", line 433, in _call_chain
    result = func(*args)
  File "c:\Python34\lib\urllib\request.py", line 579, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
<<< Process finished. (Exit code 1)
================ READY ================

你们有没有人知道为什么会这样?

0 个答案:

没有答案