使用pandas在CSV文件中获取数据

时间:2017-10-10 16:30:05

标签: python-3.x pandas dataframe

我正在尝试从网站上获取漂亮的50家公司的股票历史数据并将其转换为CSV。我需要每天更新相同的内容。有没有办法,我可以将当​​前日期数据附加到现有CSV,而无需一次又一次地下载。我的代码是这样的: -

import os
import csv
import urllib.request as urllib
import datetime as dt
import pandas as pd
import pandas_datareader.data as web
import nsepy as nse

def saveNiftySymbols():
    url = "https://www.nseindia.com/content/indices/ind_nifty50list.csv"
# pretend to be a chrome 47 browser on a windows 10 machine
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
    req = urllib.Request(url, headers = headers)
# open the url 
    x = urllib.urlopen(req)
    sourceCode = x.read().decode('utf-8') 

    cr = csv.DictReader(sourceCode.splitlines())
    l = [row['Symbol'] for row in cr]
    return l

def fetchDataFromNse(l):
    if not os.path.exists('stock_dfs'):
        os.makedirs('stock_dfs')

    start = dt.datetime(2000, 1, 1)
    end = dt.datetime.today()

    for symbol in l:
        if not os.path.exists('stock_dfs/{}.csv'.format(symbol)):
            df=nse.get_history(symbol,start, end)
            df.to_csv('stock_dfs/{}.csv'.format(symbol))
        else:
            print('Already have {}'.format(symbol))

fetchDataFromNse(saveNiftySymbols())

2 个答案:

答案 0 :(得分:1)

  1. 在收市后尝试这个,因为NSE因添加没有数据的日期而臭名昭着。
  2. 仅当您已为NSE的符号存储数据时,此选项才有效。这并不能解释成分的任何变化。这意味着当NSE更改成分时,您必须下载一次。
  3. 试试这个

    def saveNiftySymbols():
        url = "https://www.nseindia.com/content/indices/ind_nifty50list.csv"
        # pretend to be a chrome 47 browser on a windows 10 machine
        headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}          
        req = urllib.request.Request(url, headers = headers)
        url_req = urllib.request.urlopen(req)
        ## Use pandas here. Much more reliable
        table = pd.read_csv(url_req)
        return table.Symbol
    
    def fetchDataFromNse(symbols):
        if not os.path.exists('stock_dfs'):
            os.makedirs('stock_dfs')
    
        ## given that you have already stored the data, just have the end date
        start = dt.date(2000,3,31)
        end = dt.date.today()
        for symbol in symbols:
            ## you can also convert it to a list if you want.
            df = nse.get_history(symbol, end, end)
            data_to_append = df.to_csv(header= None)
            current_csv = open('stock_dfs/{}.csv'.format(symbol), 'a')
            current_csv.write(data_to_append)
            current_csv.close()
    
    
    
    fetchDataFromNse(saveNiftySymbols())
    

答案 1 :(得分:0)

是在这两个函数之前添加此代码块,并在从NSE获取数据时调用此代码

def get_last_date():
    all_files = glob.glob('stock_dfs/*.csv')
    first_csv = open(all_files[0], 'r')
    reader = csv.DictReader(first_csv)
    last_date_str = list(reader)[-1]['Date']
    fmt = '%Y-%m-%d'
    last_date_dt = dt.datetime.strptime(last_date_str, fmt).date()
    return last_date_dt

def fetchDataFromNse(symbols):
    if not os.path.exists('stock_dfs'):
        os.makedirs('stock_dfs')

    ## given that you have already stored the data, just have the end date
    start = dt.date(2000,3,31)

    ### addition here
    new_start = get_last_date()
    end = dt.date.today()
    for symbol in symbols:
        ## you can also convert it to a list if you want.
        df = nse.get_history(symbol, new_start, end)
        data_to_append = df.to_csv(header= None)
        current_csv = open('stock_dfs/{}.csv'.format(symbol), 'a')
        current_csv.write(data_to_append)
        current_csv.close()

请记住,只有当成分相同且一起更新时,这才能可靠地工作。满足第二个条件,但您必须修改第一个条件。