Question

我有两个pandas数据帧：

import pandas as pd    
index = pd.date_range('06/01/2014',periods=48,freq='H')
df1 = pd.DataFrame(range(len(index)),index=index)

index2 = pd.date_range('06/02/2014',periods=24,freq='H')
df2 = pd.DataFrame(range(0,24),index=index2)

如何使用时间序列索引将df1中的值替换为df2的值？我的意思是两个数据框的索引匹配，df1的值应替换为df2的值

Answer 1

如果始终df2.index位于df1.index，我认为您需要combine_first：

df1 = df2.combine_first(df1)

但如果没有，则有点复杂 - 添加intersection然后使用combine_first：

index = pd.date_range('06/01/2014',periods=12,freq='H')
df1 = pd.DataFrame(np.arange(len(index)),index=index)

index2 = pd.date_range('06/01/2014 08:00:00',periods=6,freq='H')
df2 = pd.DataFrame(np.arange(0,6),index=index2)
print (df1)
                      0
2014-06-01 00:00:00   0
2014-06-01 01:00:00   1
2014-06-01 02:00:00   2
2014-06-01 03:00:00   3
2014-06-01 04:00:00   4
2014-06-01 05:00:00   5
2014-06-01 06:00:00   6
2014-06-01 07:00:00   7
2014-06-01 08:00:00   8
2014-06-01 09:00:00   9
2014-06-01 10:00:00  10
2014-06-01 11:00:00  11

print (df2)

                     0
2014-06-01 08:00:00  0
2014-06-01 09:00:00  1
2014-06-01 10:00:00  2
2014-06-01 11:00:00  3
2014-06-01 12:00:00  4
2014-06-01 13:00:00  5

df1 = df2.loc[df2.index.intersection(df1.index)].combine_first(df1)
print (df1)
                       0
2014-06-01 00:00:00  0.0
2014-06-01 01:00:00  1.0
2014-06-01 02:00:00  2.0
2014-06-01 03:00:00  3.0
2014-06-01 04:00:00  4.0
2014-06-01 05:00:00  5.0
2014-06-01 06:00:00  6.0
2014-06-01 07:00:00  7.0
2014-06-01 08:00:00  0.0
2014-06-01 09:00:00  1.0
2014-06-01 10:00:00  2.0
2014-06-01 11:00:00  3.0

loc的另一个解决方案：

df1.loc[df2.index.intersection(df1.index)] = df2
print (df1)
                     0
2014-06-01 00:00:00  0
2014-06-01 01:00:00  1
2014-06-01 02:00:00  2
2014-06-01 03:00:00  3
2014-06-01 04:00:00  4
2014-06-01 05:00:00  5
2014-06-01 06:00:00  6
2014-06-01 07:00:00  7
2014-06-01 08:00:00  0
2014-06-01 09:00:00  1
2014-06-01 10:00:00  2
2014-06-01 11:00:00  3

Answer 2

注意： 我们可以使用combine_first，但我不喜欢将dtypes转换为float ...要使用combine_first，您需要添加reindex或reindex_like

df2.combine_first(df1).reindex_like(df1)

或者

df2.combine_first(df1).reindex(df1.index)

我的首选解决方案

我们可以在字典上使用map和lambda。有了这个，我可以使用字典dtype方法保留整数get，该方法在密钥不存在时采用默认值。

m = df2[0].to_dict()
f = lambda x: m.get(x, df1.at[x, 0])
df1.index.to_series().map(f)
# you can assign this back to `df1` with
# df1[0] = df1.index.to_series().map(f)

2014-06-01 00:00:00     0
2014-06-01 01:00:00     1
2014-06-01 02:00:00     2
2014-06-01 03:00:00     3
2014-06-01 04:00:00     4
2014-06-01 05:00:00     5
2014-06-01 06:00:00     6
2014-06-01 07:00:00     7
2014-06-01 08:00:00     8
2014-06-01 09:00:00     9
2014-06-01 10:00:00    10
2014-06-01 11:00:00    11
2014-06-01 12:00:00    12
2014-06-01 13:00:00    13
2014-06-01 14:00:00    14
2014-06-01 15:00:00    15
2014-06-01 16:00:00    16
2014-06-01 17:00:00    17
2014-06-01 18:00:00    18
2014-06-01 19:00:00    19
2014-06-01 20:00:00    20
2014-06-01 21:00:00    21
2014-06-01 22:00:00    22
2014-06-01 23:00:00    23
2014-06-02 00:00:00     0
2014-06-02 01:00:00     1
2014-06-02 02:00:00     2
2014-06-02 03:00:00     3
2014-06-02 04:00:00     4
2014-06-02 05:00:00     5
2014-06-02 06:00:00     6
2014-06-02 07:00:00     7
2014-06-02 08:00:00     8
2014-06-02 09:00:00     9
2014-06-02 10:00:00    10
2014-06-02 11:00:00    11
2014-06-02 12:00:00    12
2014-06-02 13:00:00    13
2014-06-02 14:00:00    14
2014-06-02 15:00:00    15
2014-06-02 16:00:00    16
2014-06-02 17:00:00    17
2014-06-02 18:00:00    18
2014-06-02 19:00:00    19
2014-06-02 20:00:00    20
2014-06-02 21:00:00    21
2014-06-02 22:00:00    22
2014-06-02 23:00:00    23
Freq: H, dtype: int64

使用时间序列索引替换pandas列值

2 个答案: