Question

我有两个pandas DataFrames：

DataFrame a

2013-03-25 13:15:00     1
2013-03-26 13:15:00     2
2013-03-28 13:15:00     4
2013-03-29 13:15:00     5

和DataFrame b

2013-03-25 13:15:00    25
2013-03-27 13:15:00    15
2013-03-28 13:15:00     5
2013-03-29 13:15:00    10

我正在尝试加入日期并向前填充值。现在我这样做：

ab = pd.concat([a, b], axis=1)
ab.fillna(method='ffill', inplace=True)

a = ab.ix[:,0]
b = ab.ix[:,1]

所以，ab是

2013-03-25 13:15:00     1    25
2013-03-26 13:15:00     2   NaN
2013-03-27 13:15:00   NaN    15
2013-03-28 13:15:00     4     5
2013-03-29 13:15:00     5    10

然后

2013-03-25 13:15:00     1    25
2013-03-26 13:15:00     2    25
2013-03-27 13:15:00     2    15
2013-03-28 13:15:00     4     5
2013-03-29 13:15:00     5    10

这有两个缺点。首先，a和b现在是系列。其次，此解决方案不适用于多列DataFrame。是否可以在a和b 到位的情况下执行此操作，而无需通过ab。这似乎是一个相当标准的过程。我错过了什么？

修改

a.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4 entries, 2013-03-25 13:15:00 to 2013-03-29 13:15:00
Data columns (total 1 columns):
icap    4 non-null float64
dtypes: float64(1)
memory usage: 64.0 bytes

b是等效的。

Answer 1

我认为在您使用combine_first并结合ffill的情况下，您会得到您想要的内容：

In [46]:
a.combine_first(b).ffill()

Out[46]:
                     a   b
index                     
2013-03-25 13:15:00  1  25
2013-03-26 13:15:00  2  25
2013-03-27 13:15:00  2  15
2013-03-28 13:15:00  4   5
2013-03-29 13:15:00  5  10

这将加入并对齐两个dfs联合的索引，这将引入NaN值，您可以使用ffill填充

从上面的结果你可以只返回感兴趣的cols，看起来你真正想要的是使用索引的并集重新索引：

In [48]:
a.reindex(a.index.union(b.index)).ffill()

Out[48]:
                     a
index                 
2013-03-25 13:15:00  1
2013-03-26 13:15:00  2
2013-03-27 13:15:00  2
2013-03-28 13:15:00  4
2013-03-29 13:15:00  5

因此，您可以为两个dfs执行此操作，而无需执行任何合并/组合

Answer 2

在不合并或加入的情况下，以所需方式修改两个DataFrame a和b的一个简单解决方案是使用它们的索引。

index_joined = a.index
index_joined = index_joined.union(b.index)
a.reindex(index=index_joined, method='ffill')
b.reindex(index=index_joined, method='ffill')

加入DataFrames的日期时间并转发填充数据

2 个答案: