如何正确导入csv文件到DataFrame?

时间:2018-05-19 15:04:08

标签: python python-3.x pandas csv

目标:我的目的是每天从yahoo finance将数据下载到pandas.DataFrame中并将其保存到csv文件中。然后在一天中使用此csv作为源,从而(重新)将其加载到pandas.DataFrame中。

问题:使用yahoo下载数据帧时,我可以使用msft[msft.index[0]]访问这些值,但在使用(重新)加载的数据框并运行ms[ms.index[0]]时,它会抛出a KeyError:Timestamp('2006-01-04 00:00:00')

问题:即使我浏览互联网几个小时,我也无法克服这个问题。如何(重新)导入数据后如何访问数据?

代码段
(注意:雅虎财务公司已经改变了它的API,因此找到了一个可以让它重新运行的工作。无论如何我都包含它来为你提供一个可运行的例子。)

import os
from datetime import datetime
import pandas as pd
from pandas_datareader import data #as pd_data_scraper
import fix_yahoo_finance as yf
yf.pdr_override()


# Define the instruments to download
tickers = ['MSFT']

# We would like all available data from 'YYYY-MM-DD' until 'YYYY-MM-DD'.
start_date = '2006-01-03'
end_date = '2018-05-14'
data_source = 'yahoo'
group_by = 'ticker'

# User pandas_reader.data.DataReader to load the desired data
panel_data = data.get_data_yahoo(tickers, group_by=group_by)

# Getting just the adjusted closing prices. This will return a Pandas DataFrame
# The index in this DataFrame is the major index of the panel_data.
close = panel_data['Close']

# Getting all weekdays between 01/01/2000 and 12/31/2016
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')

# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex close using all_weekdays as the new index
close = close.reindex(all_weekdays)

# Reindexing will insert missing values (NaN) for the dates that were not present
# in the original set. To cope with this, we can fill the missing by replacing them
# with the latest available price for each instrument.
msft = close.fillna(method='ffill')

# save scraped data
file_path = '~/data/scraped_data'
current = datetime.now().strftime('%Y%m%d')
data_path = os.path.join(file_path, current)
filename = os.path.join(data_path, 'data_dump_msft%s.csv' % current)
if not os.path.exists(data_path):
    os.makedirs(data_path)

# export to csv
msft.to_csv(filename)

ms_dir = '~/data/scraped_data/20180519/data_dump_msft20180519.csv'
ms = pd.read_csv(ms_dir, index_col=0, parse_dates=True) # setting from the former from_csv function

# the following should print twice the same but the second throws a KeyError instead
print('msft[msft.index[0]]: %s' % msft[msft.index[0]])
print('ms[ms.index[0]]: %s' % ms[ms.index[0]]) # KeyError: Timestamp('2006-01-04 00:00:00')

2 个答案:

答案 0 :(得分:0)

data_frame[something]正在访问该列,使用iloclocxs来选择行。

答案 1 :(得分:0)

感谢@Ryan Tam我提出了以下解决方案,产生了想要的输出:

ms.xs(ms.index[0]).get_value(0)