我在anaconda pandas 0.18.1上。 我曾经在pandas 12上执行此操作以延迟多索引数据帧中的某些值,其中索引级别0 =日期,索引级别1 =安全ID(每日数据3年):
In [1]: testDB.head(2)
Out[1]:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 2 entries, (2013-01-01, 000312) to (2013-01-01, 00036020)
Columns: 140 entries, in_universe to alpha_Tile
In[2]:
# lag a certain field:
#lag alphas
A_LAGS=[0,1,2,3,5,10,30,60,90]
grouped=testDB.groupby(level=1)['alpha']
for lag in A_LAGS:
lagName='lagA_'+str(int(lag))
testDB[lagName]=grouped.shift(periods=lag) # move old ones forward
现在在熊猫18上,这永远不会结束。我的意思是,永远 - 4个小时,然后去。 我知道pandas如何处理日期有变化,所以我尝试使用tshift - 不起作用并抱怨&freq not set&#39; freq。确实不是(框架是由包含日期,标识符和其他数据的h5文件构成的):
In [3]: testDB.index.levels[0]
Out[3]: DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
'2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14',
...
'2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
'2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
'2016-11-29', '2016-11-30'],
dtype='datetime64[ns]', name=u'date', length=239, freq=None)
因此我尝试将索引重置为DateTimeIndex并设置适当的频率。
In [4]: i = testDB.index.set_levels(
pd.DatetimeIndex(fullFrame.index.levels[0],freq='B'),
level=0)
i.levels[0]
Out[4]:
DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
'2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14',
...
'2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
'2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
'2016-11-29', '2016-11-30'],
dtype='datetime64[ns]', name=u'date', length=239, freq='B')
现在已设置频率,但我无法替换索引:
In [5]:y = x.reindex(index=i,level=0)
y.index.levels[0]
Out[5]:
DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
'2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14',
...
'2016-11-17', '2016-11-18', '2016-11-21', '2016-11-22',
'2016-11-23', '2016-11-24', '2016-11-25', '2016-11-28',
'2016-11-29', '2016-11-30'],
dtype='datetime64[ns]', name=u'date', length=239, freq=None)
最后(为长篇介绍道歉,我想提供完整的数据)两个问题: