将DatetimeIndex上的数据帧加入NaN的秒和分钟

时间:2016-10-13 16:32:13

标签: python pandas datetimeindex datetime64

我正在寻找一种方法来对齐数据帧,每个数据帧的时间戳都包含"包括"没有丢失数据的秒。具体来说,我的问题如下:

此处d1是我的"主要"数据帧。

ind1  = pd.date_range("20120101", "20120102",freq='S')[1:20] 
data1 = np.random.randn(len(ind1)) 
df1   = pd.DataFrame(data1, index=ind1)

EG。 df1看起来像:

                            0
2012-01-01 00:00:01  2.738425
2012-01-01 00:00:02 -0.323905
2012-01-01 00:00:03  1.861855
2012-01-01 00:00:04  0.480284
2012-01-01 00:00:05  0.340270
2012-01-01 00:00:06 -1.139052
2012-01-01 00:00:07 -0.203018
2012-01-01 00:00:08 -0.398599
2012-01-01 00:00:09 -0.568802
2012-01-01 00:00:10 -1.539783
2012-01-01 00:00:11 -1.778668
2012-01-01 00:00:12 -1.488097
2012-01-01 00:00:13  0.889712
2012-01-01 00:00:14 -0.620267
2012-01-01 00:00:15  0.075169
2012-01-01 00:00:16 -0.091302
2012-01-01 00:00:17 -1.035364
2012-01-01 00:00:18 -0.459013
2012-01-01 00:00:19 -2.177190

另外我还有另一个数据帧,比如说df2:

ind21  = pd.date_range("20120101", "20120102",freq='S')[2:7] 
ind22  = pd.date_range("20120101", "20120102",freq='S')[12:19] 
data2  = np.random.randn(len(ind21+ind22))
df2    = pd.DataFrame(data2, index=ind21+ind22)

df2看起来像(注意非周期性时间戳):

                           0
2012-01-01 00:00:02 -1.877779
2012-01-01 00:00:03  1.772659
2012-01-01 00:00:04  0.037251
2012-01-01 00:00:05 -1.195782
2012-01-01 00:00:06 -0.145339
2012-01-01 00:00:12 -0.220673
2012-01-01 00:00:13 -0.581469
2012-01-01 00:00:14 -0.520756
2012-01-01 00:00:15 -0.562677
2012-01-01 00:00:16  0.109325
2012-01-01 00:00:17 -0.195091
2012-01-01 00:00:18  0.838294

现在,我加入df并得到:

df = df1.join(df2, lsuffix='A')
                           0A         0
2012-01-01 00:00:01  2.738425       NaN
2012-01-01 00:00:02 -0.323905 -1.877779
2012-01-01 00:00:03  1.861855  1.772659
2012-01-01 00:00:04  0.480284  0.037251
2012-01-01 00:00:05  0.340270 -1.195782
2012-01-01 00:00:06 -1.139052 -0.145339
2012-01-01 00:00:07 -0.203018       NaN
2012-01-01 00:00:08 -0.398599       NaN
2012-01-01 00:00:09 -0.568802       NaN
2012-01-01 00:00:10 -1.539783       NaN
2012-01-01 00:00:11 -1.778668       NaN
2012-01-01 00:00:12 -1.488097 -0.220673
2012-01-01 00:00:13  0.889712 -0.581469
2012-01-01 00:00:14 -0.620267 -0.520756
2012-01-01 00:00:15  0.075169 -0.562677
2012-01-01 00:00:16 -0.091302  0.109325
2012-01-01 00:00:17 -1.035364 -0.195091
2012-01-01 00:00:18 -0.459013  0.838294
2012-01-01 00:00:19 -2.177190       NaN

这很好,但是,我想将第0列中的NaN值替换为"分钟级别" df2的值。因此,只有在我没有在"秒级别"上完全匹配的情况下,我想回到分钟级别。这可能是该特定分钟的所有值的简单平均值(此处:2012-01-01 00:00:00)。

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

使用DateTimeIndex属性.minute执行分组,然后用每个组(每分钟)的平均值填充缺失值:

df['0'] = df.groupby(df.index.minute)['0'].transform(lambda x: x.fillna(x.mean()))