Open High Low Close Volume Adj Close
Date
1990-01-02 00:00:00 35.25 37.50 35.00 37.25 6555600 8.70
1990-01-03 00:00:00 38.00 38.00 37.50 37.50 7444400 8.76
1990-01-04 00:00:00 38.25 38.75 37.25 37.63 7928800 8.79
1990-01-05 00:00:00 37.75 38.25 37.00 37.75 4406400 8.82
1990-01-08 00:00:00 37.50 38.00 37.00 38.00 3643200 8.88
如何摆脱上述dataframe
中的日期索引名称?它应该与其他列名称在同一行中,但不会导致问题。
由于
答案 0 :(得分:10)
尝试使用reset_index
方法将DataFrame的索引移动到一个列中(我想这就是你想要的)。
答案 1 :(得分:8)
简短的回答:你不能,也不清楚为什么这会“引发问题”。 “日期”名称命名为DataFrame的索引,该索引与任何列都不同。它会特别打印出这种偏移,因此您不会将其与框架的一列混淆。您不会按照下面的DataFrame['Date']
切入日期:
>>> import numpy as np; import pandas; import datetime
>>> dfrm = pandas.DataFrame(np.random.rand(10,3),
... columns=['A','B','C'],
... index = pandas.Index(
... [datetime.date(2012,6,elem) for elem in range(1,11)],
... name="Date"))
>>> dfrm
A B C
Date
2012-06-01 0.283724 0.863012 0.798891
2012-06-02 0.097231 0.277564 0.872306
2012-06-03 0.821461 0.499485 0.126441
2012-06-04 0.887782 0.389486 0.374118
2012-06-05 0.248065 0.032287 0.850939
2012-06-06 0.101917 0.121171 0.577643
2012-06-07 0.225278 0.161301 0.708996
2012-06-08 0.906042 0.828814 0.247564
2012-06-09 0.733363 0.924076 0.393353
2012-06-10 0.273837 0.318013 0.754807
>>> dfrm['Date']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1458, in __getitem__
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 294, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 625, in get
_, block = self._find_block(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 715, in _find_block
self._check_have(item)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 722, in _check_have
raise KeyError('no item named %s' % str(item))
KeyError: 'no item named Date'
更长的回答:
如果您希望以这种方式打印,可以通过将索引添加到自己的列中来更改您的DataFrame。例如:
>>> dfrm['Date'] = dfrm.index
>>> dfrm
A B C Date
Date
2012-06-01 0.283724 0.863012 0.798891 2012-06-01
2012-06-02 0.097231 0.277564 0.872306 2012-06-02
2012-06-03 0.821461 0.499485 0.126441 2012-06-03
2012-06-04 0.887782 0.389486 0.374118 2012-06-04
2012-06-05 0.248065 0.032287 0.850939 2012-06-05
2012-06-06 0.101917 0.121171 0.577643 2012-06-06
2012-06-07 0.225278 0.161301 0.708996 2012-06-07
2012-06-08 0.906042 0.828814 0.247564 2012-06-08
2012-06-09 0.733363 0.924076 0.393353 2012-06-09
2012-06-10 0.273837 0.318013 0.754807 2012-06-10
在此之后,您只需更改索引的名称,以便不打印任何内容:
>>> dfrm.reindex(pandas.Series(dfrm.index.values, name=''))
A B C Date
2012-06-01 0.283724 0.863012 0.798891 2012-06-01
2012-06-02 0.097231 0.277564 0.872306 2012-06-02
2012-06-03 0.821461 0.499485 0.126441 2012-06-03
2012-06-04 0.887782 0.389486 0.374118 2012-06-04
2012-06-05 0.248065 0.032287 0.850939 2012-06-05
2012-06-06 0.101917 0.121171 0.577643 2012-06-06
2012-06-07 0.225278 0.161301 0.708996 2012-06-07
2012-06-08 0.906042 0.828814 0.247564 2012-06-08
2012-06-09 0.733363 0.924076 0.393353 2012-06-09
2012-06-10 0.273837 0.318013 0.754807 2012-06-10
这似乎有点矫枉过正。另一种选择是在将Date添加为列后将索引更改为整数或其他内容:
>>> dfrm.reset_index()
或者如果您已将索引手动移动到列中,则只需
>>> dfrm.index = range(len(dfrm))
>>> dfrm
A B C Date
0 0.283724 0.863012 0.798891 2012-06-01
1 0.097231 0.277564 0.872306 2012-06-02
2 0.821461 0.499485 0.126441 2012-06-03
3 0.887782 0.389486 0.374118 2012-06-04
4 0.248065 0.032287 0.850939 2012-06-05
5 0.101917 0.121171 0.577643 2012-06-06
6 0.225278 0.161301 0.708996 2012-06-07
7 0.906042 0.828814 0.247564 2012-06-08
8 0.733363 0.924076 0.393353 2012-06-09
9 0.273837 0.318013 0.754807 2012-06-10
如果您关心列的显示顺序,请执行以下操作:
>>> dfrm.ix[:,[-1]+range(len(dfrm.columns)-1)]
Date A B C
0 2012-06-01 0.283724 0.863012 0.798891
1 2012-06-02 0.097231 0.277564 0.872306
2 2012-06-03 0.821461 0.499485 0.126441
3 2012-06-04 0.887782 0.389486 0.374118
4 2012-06-05 0.248065 0.032287 0.850939
5 2012-06-06 0.101917 0.121171 0.577643
6 2012-06-07 0.225278 0.161301 0.708996
7 2012-06-08 0.906042 0.828814 0.247564
8 2012-06-09 0.733363 0.924076 0.393353
9 2012-06-10 0.273837 0.318013 0.754807
<强>加强>
以下是一些有用的功能,包含在iPython配置脚本中(以便在启动时加载),或者放入一个模块,您可以在使用Python时轻松加载。
###########
# Imports #
###########
import pandas
import datetime
import numpy as np
from dateutil import relativedelta
from pandas.io import data as pdata
############################################
# Functions to retrieve Yahoo finance data #
############################################
# Utility to get generic stock symbol data from Yahoo finance.
# Starts two days prior to present (or most recent business day)
# and goes back a specified number of days.
def getStockSymbolData(sym_list, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
dReader = pdata.DataReader
start_date = end_date + relativedelta.relativedelta(days=-num_dates)
return dict( (sym, dReader(sym, "yahoo", start=start_date, end=end_date)) for sym in sym_list )
###
# Utility function to get some AAPL data when needed
# for testing.
def getAAPL(end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
dReader = pdata.DataReader
return getStockSymbolData(['AAPL'], end_date=end_date, num_dates=num_dates)
###
我还在下面写了一个类来保存普通股的一些数据:
#####
# Define a 'Stock' class that can hold simple info
# about a security, like SEDOL and CUSIP info. This
# is mainly for debugging things and quickly getting
# info for a single security.
class MyStock():
def __init__(self, ticker='None', sedol='None', country='None'):
self.ticker = ticker
self.sedol=sedol
self.country = country
###
def getData(self, end_date=datetime.date.today()+relativedelta.relativedelta(days=-1), num_dates = 30):
return pandas.DataFrame(getStockSymbolData([self.ticker], end_date=end_date, num_dates=num_dates)[self.ticker])
###
#####
# Make some default stock objects for common stocks.
AAPL = MyStock(ticker='AAPL', sedol='03783310', country='US')
SAP = MyStock(ticker='SAP', sedol='484628', country='DE')