如何“覆盖 - 合并”一个pandas数据帧

时间:2018-01-12 06:47:59

标签: python pandas

我有一个大型的多索引(两级日期时间)数据帧。

我有一组包含数据的小型数据框。每个数据帧代表主数据帧索引的(唯一)切片。

我想将数据复制到主数据框中。不应该存在冲突,因为每个小数据帧都是索引的唯一切片。

pd.DataFrame.mergepd.DataFrame.join不是我想要的 - 他们会想要制作新的专栏。

我在下面制作了一个小玩具示例。我想要的是foodfa中的数据(列dfb)覆盖索引重叠的None中的df

dt = pd.DatetimeIndex(start='2010-1-1', end = '2010-12-31', freq='m')
dt2 = pd.DatetimeIndex(start='2011-1-1', end = '2011-1-10', freq='d')
mi = pd.MultiIndex.from_product([dt,dt2], names=['assessment_date', 'contract_date'])

df = pd.DataFrame(index=mi)
df['foo']=None


dta1 = pd.DatetimeIndex(start='2010-1-1', end = '2010-2-1', freq='m')
dta2 = pd.DatetimeIndex(start='2011-1-1', end = '2011-1-5', freq='d')
mia = pd.MultiIndex.from_product([dta1,dta2], names=['assessment_date', 'contract_date'])
dfa = pd.DataFrame(index=mia)
dfa['foo']="dfa"


dtb1 = pd.DatetimeIndex(start='2010-4-1', end = '2010-5-1', freq='m')
dtb2 = pd.DatetimeIndex(start='2011-1-9', end = '2011-1-12', freq='d')
mib = pd.MultiIndex.from_product([dtb1,dtb2], names=['assessment_date', 'contract_date'])
dfb = pd.DataFrame(index=mib)
dfb['foo']="dfb"

1 个答案:

答案 0 :(得分:0)

我认为您需要concatreindex MultiIndex intersectioncombine_first相同的值{/ p>}

df2 = pd.concat([dfa, dfb])
df2 = df2.reindex(df.index.intersection(df2.index))
df3 = df.combine_first(df2)

使用您的数据:

df2 = pd.concat([dfa, dfb])
print (df2)
                               foo
assessment_date contract_date     
2010-01-31      2011-01-01     dfa
                2011-01-02     dfa
                2011-01-03     dfa
                2011-01-04     dfa
                2011-01-05     dfa
2010-04-30      2011-01-09     dfb
                2011-01-10     dfb
                2011-01-11     dfb
                2011-01-12     dfb

df2 = df2.reindex(df.index.intersection(df2.index))
print (df2)
                               foo
assessment_date contract_date     
2010-01-31      2011-01-01     dfa
                2011-01-02     dfa
                2011-01-03     dfa
                2011-01-04     dfa
                2011-01-05     dfa
2010-04-30      2011-01-09     dfb
                2011-01-10     dfb

df1 = df.combine_first(df2)
print (df1.head(41))

                               foo
assessment_date contract_date     
2010-01-31      2011-01-01     dfa
                2011-01-02     dfa
                2011-01-03     dfa
                2011-01-04     dfa
                2011-01-05     dfa
                2011-01-06     NaN
                2011-01-07     NaN
                2011-01-08     NaN
                2011-01-09     NaN
                2011-01-10     NaN
2010-02-28      2011-01-01     NaN
                2011-01-02     NaN
                2011-01-03     NaN
                2011-01-04     NaN
                2011-01-05     NaN
                2011-01-06     NaN
                2011-01-07     NaN
                2011-01-08     NaN
                2011-01-09     NaN
                2011-01-10     NaN
2010-03-31      2011-01-01     NaN
                2011-01-02     NaN
                2011-01-03     NaN
                2011-01-04     NaN
                2011-01-05     NaN
                2011-01-06     NaN
                2011-01-07     NaN
                2011-01-08     NaN
                2011-01-09     NaN
                2011-01-10     NaN
2010-04-30      2011-01-01     NaN
                2011-01-02     NaN
                2011-01-03     NaN
                2011-01-04     NaN
                2011-01-05     NaN
                2011-01-06     NaN
                2011-01-07     NaN
                2011-01-08     NaN
                2011-01-09     dfb
                2011-01-10     dfb
2010-05-31      2011-01-01     NaN