我有三个按日期时间索引的Pandas数据帧:df1,df2和df3。每个索引中都有成对的日期。我想将这三个数据框组合在一起,保留任何唯一的日期时间索引对,但要组合任何重复的对,这样就不会多次列出这些日期对(不是简单的连接)。以下是数据框的示例:
In [1]: print df1
CurTempMid id
fldDate
1997-12-23 0.0 recent
1997-12-23 -2.0 hist
1997-12-27 9.0 recent
1997-12-27 7.0 hist
1998-02-10 9.0 recent
1998-02-10 7.0 hist
... ... ...
2001-01-04 27.0 recent
2001-01-04 26.0 hist
2001-03-16 12.0 recent
2001-03-16 11.0 hist
2001-04-06 23.0 recent
2001-04-06 22.0 hist
In [2]: print df2
MaxTempMid id
fldDate
1998-01-02 29.0 recent
1998-01-02 28.0 hist
1998-02-15 18.0 recent
1998-02-15 23.0 hist
1998-02-23 24.0 recent
1998-02-23 15.0 hist
... ... ...
2001-01-01 16.0 recent
2001-01-01 22.0 hist
2001-01-04 30.0 recent
2001-01-04 37.0 hist
2001-02-16 14.0 recent
2001-02-16 11.0 hist
In [3]: print df3
MinTempMid id
fldDate
1997-12-23 0.0 recent
1997-12-23 -2.0 hist
1997-12-26 -3.0 recent
1997-12-26 -2.0 hist
1997-12-27 -1.0 recent
1997-12-27 0.0 hist
... ... ...
2001-02-18 9.0 recent
2001-02-18 36.0 hist
2001-03-11 18.0 recent
2001-03-11 38.0 hist
2001-03-12 13.0 recent
2001-03-12 16.0 hist
所需结果如下:
CurTempMid MaxTempMid MinTempMid id
fldDate
1997-12-23 0.0 Nan 0.0 recent
1997-12-23 -2.0 NaN -2.0 hist
1997-12-26 Nan NaN -3.0 recent
1997-12-26 NaN NaN -2.0 hist
1997-12-27 9.0 NaN -1.0 recent
1997-12-27 7.0 NaN 0.0 hist
... ... ... ... ...
一旦合并,“ id”列应该相同,因此我只需要保留一个“ id”列。
答案 0 :(得分:3)
如果您确定在整个时间序列中id列均相同,则此解决方案应为您工作。您可以在其fldDate和id列上合并这三个数据框,然后将索引设置回fldDate。
m = (df1.reset_index()
.merge(df2.reset_index(), on=['fldDate', 'id'], how='outer')
.merge(df3.reset_index(), on=['fldDate', 'id'], how='outer')
.sort_values('fldDate'))
m.set_index('fldDate', inplace=True)
print(m.head())
# CurTempMid id MaxTempMid MinTempMid
# fldDate
# 1997-12-23 0.0 recent NaN 0.0
# 1997-12-23 -2.0 hist NaN -2.0
# 1997-12-26 NaN hist NaN -2.0
# 1997-12-26 NaN recent NaN -3.0
# 1997-12-27 9.0 recent NaN -1.0