将多个Pandas数据框与重复的日期时间索引对结合使用

时间:2018-06-26 00:04:12

标签: python pandas

我有三个按日期时间索引的Pandas数据帧:df1,df2和df3。每个索引中都有成对的日期。我想将这三个数据框组合在一起,保留任何唯一的日期时间索引对,但要组合任何重复的对,这样就不会多次列出这些日期对(不是简单的连接)。以下是数据框的示例:

In [1]: print df1
            CurTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-27         9.0  recent
1997-12-27         7.0    hist     
1998-02-10         9.0  recent
1998-02-10         7.0    hist
...                ...     ... 
2001-01-04        27.0  recent
2001-01-04        26.0    hist
2001-03-16        12.0  recent
2001-03-16        11.0    hist
2001-04-06        23.0  recent
2001-04-06        22.0    hist

In [2]: print df2
            MaxTempMid      id
fldDate                       
1998-01-02        29.0  recent
1998-01-02        28.0    hist
1998-02-15        18.0  recent
1998-02-15        23.0    hist
1998-02-23        24.0  recent
1998-02-23        15.0    hist
...                ...     ... 
2001-01-01        16.0  recent
2001-01-01        22.0    hist
2001-01-04        30.0  recent
2001-01-04        37.0    hist
2001-02-16        14.0  recent
2001-02-16        11.0    hist

In [3]: print df3
            MinTempMid      id
fldDate                       
1997-12-23         0.0  recent
1997-12-23        -2.0    hist
1997-12-26        -3.0  recent
1997-12-26        -2.0    hist
1997-12-27        -1.0  recent
1997-12-27         0.0    hist
...                ...     ...
2001-02-18         9.0  recent
2001-02-18        36.0    hist
2001-03-11        18.0  recent
2001-03-11        38.0    hist
2001-03-12        13.0  recent
2001-03-12        16.0    hist

所需结果如下:

            CurTempMid MaxTempMid MinTempMid       id    
fldDate                       
1997-12-23         0.0        Nan        0.0   recent
1997-12-23        -2.0        NaN       -2.0     hist
1997-12-26         Nan        NaN       -3.0   recent
1997-12-26         NaN        NaN       -2.0     hist
1997-12-27         9.0        NaN       -1.0   recent
1997-12-27         7.0        NaN        0.0     hist 
...                ...        ...        ...      ...

一旦合并,“ id”列应该相同,因此我只需要保留一个“ id”列。

1 个答案:

答案 0 :(得分:3)

如果您确定在整个时间序列中id列均相同,则此解决方案应为您工作。您可以在其fldDate和id列上合并这三个数据框,然后将索引设置回fldDate。

m = (df1.reset_index()
        .merge(df2.reset_index(), on=['fldDate', 'id'], how='outer')
        .merge(df3.reset_index(), on=['fldDate', 'id'], how='outer')
        .sort_values('fldDate'))
m.set_index('fldDate', inplace=True)
print(m.head())
#             CurTempMid      id  MaxTempMid  MinTempMid
# fldDate
# 1997-12-23         0.0  recent         NaN         0.0
# 1997-12-23        -2.0    hist         NaN        -2.0
# 1997-12-26         NaN    hist         NaN        -2.0
# 1997-12-26         NaN  recent         NaN        -3.0
# 1997-12-27         9.0  recent         NaN        -1.0