插值包含时间值的序列

时间:2019-09-06 07:55:25

标签: python

我有以下数据框:

df = pd.DataFrame(data={
    'time': ['15/04/2019 21:37'] + [-99] * 2 +
      ['15/04/2019 21:40', '16/04/2019 20:00'] + [-99] * 2 + ['16/04/2019 20:03', '16/04/2019 20:04']
            })


0    15/04/2019 21:37
1                 -99
2                 -99
3    15/04/2019 21:40
4    16/04/2019 20:00
5                 -99
6                 -99
7    16/04/2019 20:03
8    16/04/2019 20:04
Name: time, dtype: object

我想要的是一个可以用内插时间值替换缺失值(-99)的函数,以获得:

0    15/04/2019 21:37
1    15/04/2019 21:38
2    15/04/2019 21:39
3    15/04/2019 21:40
4    16/04/2019 20:00
5    15/04/2019 20:01
6    15/04/2019 20:02
7    16/04/2019 20:03
8    16/04/2019 20:04
Name: time, dtype: object

2 个答案:

答案 0 :(得分:2)

想法将值转换为原始格式的纳秒,进行插值并转换回日期时间:

df['time'] = pd.to_datetime(df['time'], format='%d/%m/%Y %H:%M',  errors='coerce')

mask = df['time'].isna()
#or
#mask = df['time'] == -99

arr = np.where(mask, np.nan, df['time'].astype(np.int64))
df['new'] = pd.to_datetime(pd.Series(arr, index=df.index).interpolate(), unit='ns')
print (df)
                 time                 new
0 2019-04-15 21:37:00 2019-04-15 21:37:00
1                 NaT 2019-04-15 21:38:00
2                 NaT 2019-04-15 21:39:00
3 2019-04-15 21:40:00 2019-04-15 21:40:00
4 2019-04-16 20:00:00 2019-04-16 20:00:00
5                 NaT 2019-04-16 20:01:00
6                 NaT 2019-04-16 20:02:00
7 2019-04-16 20:03:00 2019-04-16 20:03:00
8 2019-04-16 20:04:00 2019-04-16 20:04:00

答案 1 :(得分:0)

对于插值本身,您可以对datetime.timedelta使用除法和乘法运算:

import datetime

def interpolate(start, end, steps):
    ''' return interpolated steps, start and end exclusive '''
    diff = end - start
    step_size = diff / (steps + 1)
    interpolated_values = [start + (i+1) * step_size for i in range(steps)]
    return interpolated_values


start = datetime.datetime.strptime('15/04/2019 21:37', '%d/%m/%Y %H:%M')
end   = datetime.datetime.strptime('15/04/2019 21:40', '%d/%m/%Y %H:%M')

interpolated = interpolate(start, end, 2)

print(start)
for i in interpolated:
    print(i)
print(end)

它将输出:

2019-04-15 21:37:00
2019-04-15 21:38:00
2019-04-15 21:39:00
2019-04-15 21:40:00

现在,您需要在数据中找到空白的开始和结束,并用结果值填充空白。