基于列标签DatetimeIndex组合DataFrame

时间:2015-08-09 23:20:41

标签: python pandas merge split-apply-combine

我将天气数据存储在许多单独的文件中,其中列用于特定的测量仪器,每行对应于特定日期的平均读数。假设一个文件看起来像这样:

first = pd.DataFrame(np.random.random((10,3)), 
                     pd.date_range('1950-01-01', periods=10), 
                     columns=['A','B','C'])

first
Out[21]: 
                   A         B         C
1950-01-01  0.939932  0.504543  0.091025
1950-01-02  0.121418  0.725333  0.444813
1950-01-03  0.338385  0.783398  0.116468
1950-01-04  0.847905  0.846147  0.226074
1950-01-05  0.156315  0.704804  0.524886
1950-01-06  0.412284  0.425379  0.427246
1950-01-07  0.165859  0.406347  0.114586
1950-01-08  0.392670  0.789526  0.174001
1950-01-09  0.246180  0.776304  0.019368
1950-01-10  0.142213  0.731748  0.954076

第二个看起来像这样,

second = pd.DataFrame(np.random.random((10,3)), 
                      pd.date_range('1950-01-11', periods=10), 
                      columns=['A','B','D'])



second
Out[30]: 
                   A         B         D
1950-01-11  0.190767  0.905640  0.325411
1950-01-12  0.109964  0.754694  0.414402
1950-01-13  0.058164  0.305405  0.768333
1950-01-14  0.267644  0.919876  0.631083
1950-01-15  0.981333  0.454678  0.533075
1950-01-16  0.831600  0.823845  0.980366
1950-01-17  0.303585  0.091634  0.338517
1950-01-18  0.723445  0.088020  0.570779
1950-01-19  0.639665  0.954577  0.763810
1950-01-20  0.370629  0.716066  0.628383

我想将这两者合并在一起,以便所有仪器(即A,B,C,D,......)可以在所有测量时间段的同一文件中显示。预期结果如下:

                   A         B         C         D
1950-01-01  0.939932  0.504543  0.091025
1950-01-02  0.121418  0.725333  0.444813
1950-01-03  0.338385  0.783398  0.116468
1950-01-04  0.847905  0.846147  0.226074
1950-01-05  0.156315  0.704804  0.524886
1950-01-06  0.412284  0.425379  0.427246
1950-01-07  0.165859  0.406347  0.114586
1950-01-08  0.392670  0.789526  0.174001
1950-01-09  0.246180  0.776304  0.019368
1950-01-10  0.142213  0.731748  0.954076
1950-01-11  0.190767  0.905640           0.325411
1950-01-12  0.109964  0.754694           0.414402
1950-01-13  0.058164  0.305405           0.768333
1950-01-14  0.267644  0.919876           0.631083
1950-01-15  0.981333  0.454678           0.533075
1950-01-16  0.831600  0.823845           0.980366
1950-01-17  0.303585  0.091634           0.338517
1950-01-18  0.723445  0.088020           0.570779
1950-01-19  0.639665  0.954577           0.763810
1950-01-20  0.370629  0.716066           0.628383

为了得到这个,我试过了:

first.merge(second, how='outer', left_index=True, right_index=True)
Out[34]: 
                 A_x       B_x         C       A_y       B_y         D
1950-01-01  0.939932  0.504543  0.091025       NaN       NaN       NaN
1950-01-02  0.121418  0.725333  0.444813       NaN       NaN       NaN
1950-01-03  0.338385  0.783398  0.116468       NaN       NaN       NaN
1950-01-04  0.847905  0.846147  0.226074       NaN       NaN       NaN
1950-01-05  0.156315  0.704804  0.524886       NaN       NaN       NaN
1950-01-06  0.412284  0.425379  0.427246       NaN       NaN       NaN
1950-01-07  0.165859  0.406347  0.114586       NaN       NaN       NaN
1950-01-08  0.392670  0.789526  0.174001       NaN       NaN       NaN
1950-01-09  0.246180  0.776304  0.019368       NaN       NaN       NaN
1950-01-10  0.142213  0.731748  0.954076       NaN       NaN       NaN
1950-01-11       NaN       NaN       NaN  0.190767  0.905640  0.325411
1950-01-12       NaN       NaN       NaN  0.109964  0.754694  0.414402
1950-01-13       NaN       NaN       NaN  0.058164  0.305405  0.768333
1950-01-14       NaN       NaN       NaN  0.267644  0.919876  0.631083
1950-01-15       NaN       NaN       NaN  0.981333  0.454678  0.533075
1950-01-16       NaN       NaN       NaN  0.831600  0.823845  0.980366
1950-01-17       NaN       NaN       NaN  0.303585  0.091634  0.338517
1950-01-18       NaN       NaN       NaN  0.723445  0.088020  0.570779
1950-01-19       NaN       NaN       NaN  0.639665  0.954577  0.763810
1950-01-20       NaN       NaN       NaN  0.370629  0.716066  0.628383

但正如您所看到的那样,需要合并的列已被拆分,因为没有常见的行索引。我觉得这个功能对熊猫来说是一个非常有用的补充。可以这样做吗?

2 个答案:

答案 0 :(得分:1)

另一种方法是使用.combine函数,它将结果的形状更改为两个轴中的并集。

combiner = lambda x, y: np.where(pd.isnull(x), y, x)
first.combine(second, combiner)

                 A       B       C       D
1950-01-01  0.7917  0.5289  0.5680     NaN
1950-01-02  0.9256  0.0710  0.0871     NaN
1950-01-03  0.0202  0.8326  0.7782     NaN
1950-01-04  0.8700  0.9786  0.7992     NaN
1950-01-05  0.4615  0.7805  0.1183     NaN
1950-01-06  0.6399  0.1434  0.9447     NaN
1950-01-07  0.5218  0.4147  0.2646     NaN
1950-01-08  0.7742  0.4562  0.5684     NaN
1950-01-09  0.0188  0.6176  0.6121     NaN
1950-01-10  0.6169  0.9437  0.6818     NaN
1950-01-11  0.3595  0.4370     NaN  0.6976
1950-01-12  0.0602  0.6668     NaN  0.6706
1950-01-13  0.2104  0.1289     NaN  0.3154
1950-01-14  0.3637  0.5702     NaN  0.4386
1950-01-15  0.9884  0.1020     NaN  0.2089
1950-01-16  0.1613  0.6531     NaN  0.2533
1950-01-17  0.4663  0.2444     NaN  0.1590
1950-01-18  0.1104  0.6563     NaN  0.1382
1950-01-19  0.1966  0.3687     NaN  0.8210
1950-01-20  0.0971  0.8379     NaN  0.0961

答案 1 :(得分:0)

假设firstdf1seconddf2,则使用concat似乎可以解决您的问题。

>>> pd.concat([df1, df2])
                   A         B         C         D
1950-01-01  0.939932  0.504543  0.091025       NaN
1950-01-02  0.121418  0.725333  0.444813       NaN
1950-01-03  0.338385  0.783398  0.116468       NaN
1950-01-04  0.847905  0.846147  0.226074       NaN
1950-01-05  0.156315  0.704804  0.524886       NaN
1950-01-06  0.412284  0.425379  0.427246       NaN
1950-01-07  0.165859  0.406347  0.114586       NaN
1950-01-08  0.392670  0.789526  0.174001       NaN
1950-01-09  0.246180  0.776304  0.019368       NaN
1950-01-10  0.142213  0.731748  0.954076       NaN
1950-01-11  0.190767  0.905640       NaN  0.325411
1950-01-12  0.109964  0.754694       NaN  0.414402
1950-01-13  0.058164  0.305405       NaN  0.768333
1950-01-14  0.267644  0.919876       NaN  0.631083
1950-01-15  0.981333  0.454678       NaN  0.533075
1950-01-16  0.831600  0.823845       NaN  0.980366
1950-01-17  0.303585  0.091634       NaN  0.338517
1950-01-18  0.723445  0.088020       NaN  0.570779
1950-01-19  0.639665  0.954577       NaN  0.763810
1950-01-20  0.370629  0.716066       NaN  0.628383