我有一个包含两级行索引的数据框,这两个级别都是日期时间。
数据框中有很多NA漏洞,我想填充逻辑:如果有NA(),则使用具有相同内部索引的上一行的最后已知值。
这个片段显示了我所期待的:
#forward-filling on multiindex doesn't work as I hoped.
df3.loc['2010-01-31', '2011-01-02'] = 99
df3.ffill()
#ok
assert df3.loc['2010-01-31', '2011-01-02'].values[0] == 99
#fail
assert df3.loc['2010-02-28', '2011-01-02'].values[0] == 99
assert df3.loc['2010-03-31', '2011-01-02'].values[0] == 99
不幸的是,它看起来像this is not implemented for multiindex ...
这是一次性的情况,不是我需要的一般化解决方案。不确定什么是合理的黑客。
以下是生成我用
实验的样本的完整代码#create the main dataframe
dt = pd.DatetimeIndex(start='2010-1-1', end = '2010-12-31', freq='m')
dt2 = pd.DatetimeIndex(start='2011-1-1', end = '2011-1-10', freq='d')
mi = pd.MultiIndex.from_product([dt,dt2], names=['assessment_date', 'contract_date'])
df = pd.DataFrame(index=mi)
df['foo']=None
#sub information dfa
dta1 = pd.DatetimeIndex(start='2010-1-1', end = '2010-2-1', freq='m')
dta2 = pd.DatetimeIndex(start='2011-1-1', end = '2012-1-5', freq='d')
mia = pd.MultiIndex.from_product([dta1,dta2], names=['assessment_date', 'contract_date'])
dfa = pd.DataFrame(index=mia)
dfa['foo']="dfa"
#sub information dfb
dtb1 = pd.DatetimeIndex(start='2010-4-1', end = '2010-5-1', freq='m')
dtb2 = pd.DatetimeIndex(start='2011-1-9', end = '2011-1-12', freq='d')
mib = pd.MultiIndex.from_product([dtb1,dtb2], names=['assessment_date', 'contract_date'])
dfb = pd.DataFrame(index=mib)
dfb['foo']="dfb"
#take all the data in dfa, dfb, and put it into the df
df2 = pd.concat([dfa, dfb])
df2 = df2.reindex(df.index.intersection(df2.index))
df3 = df.combine_first(df2)
df3.head(50)
#forward-filling on multiindex doesn't work as I hoped.
df3.loc['2010-01-31', '2011-01-02'] = 99
df3.ffill()
#ok
assert df3.loc['2010-01-31', '2011-01-02'].values[0] == 99
assert df3.loc['2010-02-28', '2011-01-02'].values[0] == 99
assert df3.loc['2010-03-31', '2011-01-02'].values[0] == 99