根据缺少的日期时间值添加空的数据框行

时间:2018-06-27 22:19:47

标签: python pandas datetime dataframe indexing

我正尝试将行添加到我的pandas数据框中,如下所示:

import pandas as pd
import datetime as dt

d={'datetime':[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
  'value':[4.,5.,1.]}

df=pd.DataFrame(d)

哪个输出:

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:10:00    5.0
2 2018-03-01 00:40:00    1.0

我想做的是添加从00:00:00到00:40:00的行,以每5分钟显示一次。我想要的输出看起来像这样:

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:05:00    NaN
2 2018-03-01 00:10:00    5.0
3 2018-03-01 00:15:00    NaN
4 2018-03-01 00:20:00    NaN
5 2018-03-01 00:25:00    NaN
6 2018-03-01 00:30:00    NaN
7 2018-03-01 00:35:00    NaN
8 2018-03-01 00:40:00    1.0

我怎么到达那里?

2 个答案:

答案 0 :(得分:1)

您可以使用pd.DataFrame.resample

df = df.resample('5Min', on='datetime').first()\
       .drop('datetime', 1).reset_index()

print(df)

             datetime  value
0 2018-03-01 00:00:00    4.0
1 2018-03-01 00:05:00    NaN
2 2018-03-01 00:10:00    5.0
3 2018-03-01 00:15:00    NaN
4 2018-03-01 00:20:00    NaN
5 2018-03-01 00:25:00    NaN
6 2018-03-01 00:30:00    NaN
7 2018-03-01 00:35:00    NaN
8 2018-03-01 00:40:00    1.0

答案 1 :(得分:0)

首先,您可以创建一个包含最终日期时间索引的数据框,然后影响第二个:

df1 = pd.DataFrame({'value': np.nan} ,index=pd.date_range('2018-03-01 00:00:00', 
                     periods=9, freq='5min'))

print(df)
#Output :
                   value
2018-03-01 00:00:00 NaN
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 NaN
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 NaN

现在,假设您的数据框是第二个,您可以将其添加到上面的代码中:

d={'datetime': 
[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
'value':[4.,5.,1.]}

df2=pd.DataFrame(d)
df2.datetime = pd.to_datetime(df2.datetime)
df2.set_index('datetime',inplace=True)
print(df2)

#Output
                   value
datetime    
2018-03-01 00:00:00 4.0
2018-03-01 00:10:00 5.0
2018-03-01 00:40:00 1.0

最后:

df1.value = df2.value
print(df1)

#output
                   value
2018-03-01 00:00:00 4.0
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 5.0
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 1.0