合并两个带有重复日期时间索引的Pandas系列

时间:2017-08-14 14:40:54

标签: python pandas

我有两个由datetime索引的Pandas系列(d1和d2),每个系列包含一列包含float和NaN的数据。两个指数都是一天的间隔,尽管时间条目与许多失踪天数不一致。 d1的范围从1974-12-16到2002-01-30。 d2的范围从1997-12-19到2017-07-06。 1997-12-19至2002-01-30期间包含两个系列之间的许多重复索引。重复索引的数据有时是相同的值,不同的值或一个值和NaN。

我想将这两个系列合二为一,在有重复索引的情况下随时优先处理来自d2的数据(即,只要有重复的索引,就用d2数据替换所有d1数据)。在许多可用的Pandas工具(合并,连接,连接等)中执行此操作的最有效方法是什么?

以下是我的数据示例:

In [7]: print d1
fldDate
1974-12-16    19.0
1974-12-17    28.0
1974-12-18    24.0
1974-12-19    18.0
1974-12-20    17.0
1974-12-21    28.0
1974-12-22    28.0
1974-12-23    10.0
1974-12-24     6.0
1974-12-25     5.0
1974-12-26    12.0
1974-12-27    19.0
1974-12-28    22.0
1974-12-29    20.0
1974-12-30    16.0
1974-12-31    12.0
1975-01-01    12.0
1975-01-02    15.0
1975-01-03    14.0
1975-01-04    15.0
1975-01-05    18.0
1975-01-06    21.0
1975-01-07    22.0
1975-01-08    18.0
1975-01-09    20.0
1975-01-10    12.0
1975-01-11     8.0
1975-01-12    -2.0
1975-01-13    13.0
1975-01-14    24.0
              ... 
2002-01-01    18.0
2002-01-02    16.0
2002-01-03     NaN
2002-01-04    24.0
2002-01-05    23.0
2002-01-06    15.0
2002-01-07    22.0
2002-01-08    34.0
2002-01-09    35.0
2002-01-10    29.0
2002-01-11    21.0
2002-01-12    24.0
2002-01-13     NaN
2002-01-14    18.0
2002-01-15    14.0
2002-01-16    10.0
2002-01-17     5.0
2002-01-18     7.0
2002-01-19     7.0
2002-01-20     7.0
2002-01-21    11.0
2002-01-22     NaN
2002-01-23     9.0
2002-01-24     8.0
2002-01-25    15.0
2002-01-26     NaN
2002-01-27     NaN
2002-01-28    18.0
2002-01-29    13.0
2002-01-30    13.0
Name: MaxTempMid, dtype: float64

In [8]: print d2
fldDate
1997-12-19    22.0
1997-12-20    14.0
1997-12-21    18.0
1997-12-22    16.0
1997-12-23    16.0
1997-12-24    10.0
1997-12-25    12.0
1997-12-26    12.0
1997-12-27     9.0
1997-12-28    12.0
1997-12-29    18.0
1997-12-30    23.0
1997-12-31    28.0
1998-01-01    26.0
1998-01-02    29.0
1998-01-03    27.0
1998-01-04    22.0
1998-01-05    19.0
1998-01-06    17.0
1998-01-07    14.0
1998-01-08    14.0
1998-01-09    14.0
1998-01-10    16.0
1998-01-11    20.0
1998-01-12    21.0
1998-01-13    19.0
1998-01-14    20.0
1998-01-15    16.0
1998-01-16    17.0
1998-01-17    20.0
              ... 
2017-06-07    68.0
2017-06-08    71.0
2017-06-09    71.0
2017-06-10    59.0
2017-06-11    41.0
2017-06-12    57.0
2017-06-13    58.0
2017-06-14    36.0
2017-06-15    50.0
2017-06-16    58.0
2017-06-17    54.0
2017-06-18    53.0
2017-06-19    58.0
2017-06-20    68.0
2017-06-21    71.0
2017-06-22    71.0
2017-06-23    59.0
2017-06-24    61.0
2017-06-25    65.0
2017-06-26    68.0
2017-06-27    71.0
2017-06-28    60.0
2017-06-29    54.0
2017-06-30    48.0
2017-07-01    60.0
2017-07-02    68.0
2017-07-03    65.0
2017-07-04    73.0
2017-07-05    74.0
2017-07-06    77.0
Name: MaxTempMid, dtype: float64

1 个答案:

答案 0 :(得分:1)

让我们使用combine_first

df2.combine_first(df1)

输出:

fldDate
1974-12-16    19.0
1974-12-17    28.0
1974-12-18    24.0
1974-12-19    18.0
1974-12-20    17.0
1974-12-21    28.0
1974-12-22    28.0
1974-12-23    10.0
1974-12-24     6.0
1974-12-25     5.0
1974-12-26    12.0
1974-12-27    19.0
1974-12-28    22.0
1974-12-29    20.0
1974-12-30    16.0
1974-12-31    12.0
1975-01-01    12.0
1975-01-02    15.0
1975-01-03    14.0
1975-01-04    15.0
1975-01-05    18.0
1975-01-06    21.0
1975-01-07    22.0
1975-01-08    18.0
1975-01-09    20.0
1975-01-10    12.0
1975-01-11     8.0
1975-01-12    -2.0
1975-01-13    13.0
1975-01-14    24.0
              ... 
2017-06-07    68.0
2017-06-08    71.0
2017-06-09    71.0
2017-06-10    59.0
2017-06-11    41.0
2017-06-12    57.0
2017-06-13    58.0
2017-06-14    36.0
2017-06-15    50.0
2017-06-16    58.0
2017-06-17    54.0
2017-06-18    53.0
2017-06-19    58.0
2017-06-20    68.0
2017-06-21    71.0
2017-06-22    71.0
2017-06-23    59.0
2017-06-24    61.0
2017-06-25    65.0
2017-06-26    68.0
2017-06-27    71.0
2017-06-28    60.0
2017-06-29    54.0
2017-06-30    48.0
2017-07-01    60.0
2017-07-02    68.0
2017-07-03    65.0
2017-07-04    73.0
2017-07-05    74.0
2017-07-06    77.0