我有以下数据框:
df = pd.DataFrame(data={
'time': ['15/04/2019 21:37'] + [-99] * 2 +
['15/04/2019 21:40', '16/04/2019 20:00'] + [-99] * 2 + ['16/04/2019 20:03', '16/04/2019 20:04']
})
0 15/04/2019 21:37
1 -99
2 -99
3 15/04/2019 21:40
4 16/04/2019 20:00
5 -99
6 -99
7 16/04/2019 20:03
8 16/04/2019 20:04
Name: time, dtype: object
我想要的是一个可以用内插时间值替换缺失值(-99)的函数,以获得:
0 15/04/2019 21:37
1 15/04/2019 21:38
2 15/04/2019 21:39
3 15/04/2019 21:40
4 16/04/2019 20:00
5 15/04/2019 20:01
6 15/04/2019 20:02
7 16/04/2019 20:03
8 16/04/2019 20:04
Name: time, dtype: object
答案 0 :(得分:2)
想法将值转换为原始格式的纳秒,进行插值并转换回日期时间:
df['time'] = pd.to_datetime(df['time'], format='%d/%m/%Y %H:%M', errors='coerce')
mask = df['time'].isna()
#or
#mask = df['time'] == -99
arr = np.where(mask, np.nan, df['time'].astype(np.int64))
df['new'] = pd.to_datetime(pd.Series(arr, index=df.index).interpolate(), unit='ns')
print (df)
time new
0 2019-04-15 21:37:00 2019-04-15 21:37:00
1 NaT 2019-04-15 21:38:00
2 NaT 2019-04-15 21:39:00
3 2019-04-15 21:40:00 2019-04-15 21:40:00
4 2019-04-16 20:00:00 2019-04-16 20:00:00
5 NaT 2019-04-16 20:01:00
6 NaT 2019-04-16 20:02:00
7 2019-04-16 20:03:00 2019-04-16 20:03:00
8 2019-04-16 20:04:00 2019-04-16 20:04:00
答案 1 :(得分:0)
对于插值本身,您可以对datetime.timedelta
使用除法和乘法运算:
import datetime
def interpolate(start, end, steps):
''' return interpolated steps, start and end exclusive '''
diff = end - start
step_size = diff / (steps + 1)
interpolated_values = [start + (i+1) * step_size for i in range(steps)]
return interpolated_values
start = datetime.datetime.strptime('15/04/2019 21:37', '%d/%m/%Y %H:%M')
end = datetime.datetime.strptime('15/04/2019 21:40', '%d/%m/%Y %H:%M')
interpolated = interpolate(start, end, 2)
print(start)
for i in interpolated:
print(i)
print(end)
它将输出:
2019-04-15 21:37:00
2019-04-15 21:38:00
2019-04-15 21:39:00
2019-04-15 21:40:00
现在,您需要在数据中找到空白的开始和结束,并用结果值填充空白。