假设我有这段代码:
import numpy as np
import time
from datetime import datetime
class Measurements():
def __init__(self, time_var, value):
self.time_var = time_var
self.value = value
a = np.array([ Measurements('30-01-2017 12:02:15.880922', 100),
Measurements('30-01-2017 12:02:16.880922', 100),
Measurements('30-01-2017 12:02:17.880922', 110),
Measurements('30-01-2017 12:02:18.880922', 99),
Measurements('30-01-2017 12:02:19.880922', 96)])
b = np.array([ Measurements('30-01-2017 12:02:15.123444', 10),
Measurements('30-01-2017 12:02:18.880919', 12),
])
所以,我有来自a的5次测量和来自b的2次测量。
我希望以a
为基础,在b
发生的特定时间找到缺失的a
值。
因此,最终b
将始终具有a
时间值和长度。(当时,我考虑过以time.mktime(datetime.strptime(s, "%d-%m-%Y %H:%M:%S.%f").timetuple())
为单位返回时间
所以,b
将是:
np.array([ Measurements('30-01-2017 12:02:15.880922', MISSING_VALUE),
Measurements('30-01-2017 12:02:16.880922', MISSING_VALUE),
Measurements('30-01-2017 12:02:17.880922', MISSING_VALUE),
Measurements('30-01-2017 12:02:18.880922', MISSING_VALUE),
Measurements('30-01-2017 12:02:19.880922', MISSING_VALUE)])
现在,我不知道如何处理这个问题。
一种想法是首先执行interp
as here并将b长度拉伸为等于a。
或使用interp1d
(更灵活):
from scipy import interpolate
a = np.array([100, 123, 123, 118, 123])
b = np.array([12, 11, 14, 13])
b_interp = interpolate.interp1d(np.arange(b.size),b, kind ='cubic', assume_sorted=False)
b_new = b_interp(np.linspace(0, b.size-1, a.size))
但是,如何处理时间?
答案 0 :(得分:1)
以下是您的问题的解决方案:
scipy.interpolate.interp1d
kind="cubic"
不能正常工作)scipy.interpolate.interp1d
插入不在您定义的范围内的值(b
次的范围)我更改了您的初始代码以显示:
time_a_full = ['30-01-2017 12:02:15.880922','30-01-2017 12:02:16.880922','30-01-2017 12:02:17.880922','30-01-2017 12:02:18.880922','30-01-2017 12:02:19.880922','30-01-2017 12:02:22.880922']
time_b_full = ['30-01-2017 12:02:15.123444','30-01-2017 12:02:16.880919','30-01-2017 12:02:18.880920', '30-01-2017 12:02:19.880922','30-01-2017 12:02:20.880922']
# Here I transform the time in seconds as suggested
time_a = np.array([time.mktime(datetime.strptime(s, "%d-%m-%Y %H:%M:%S.%f").timetuple()) for s in time_a_full])
time_b = np.array([time.mktime(datetime.strptime(s, "%d-%m-%Y %H:%M:%S.%f").timetuple()) for s in time_b_full])
values_a = np.array([100,100,110,99,96,95])
values_b = np.array([10,12,13,16,20])
# result of the linear interp with the numpy function
np.interp(time_a, time_b, values_b)
# result of the cubic interpolation
f = interpolate.interp1d(time_b,values_b, kind="cubic")
time_a[time_a<time_b.min()]=time_b.min() # use this to stay on range define by the times of b
time_a[time_a>time_b.max()]=time_b.max() # use this to stay on range define by the times of b
f(time_a)