我想计算时间增量不规则的数据集的10秒差异。数据存在于2个相等长度的1-D数组中,一个用于时间,另一个用于数据值。
经过一番探索后,我能够提出一个解决方案,但基于(我怀疑)不得不遍历数组中的每个项目,它太慢了。
我的一般方法是遍历时间数组,并且对于每个时间值,我找到之前x秒的时间值的索引。然后我在数据数组上使用这些索引来计算差异。
代码如下所示。
首先,来自Bi Rico的find_closest
函数
def find_closest(A, target):
#A must be sorted
idx = A.searchsorted(target)
idx = np.clip(idx, 1, len(A)-1)
left = A[idx-1]
right = A[idx]
idx -= target - left < right - target
return idx
然后我按照以下方式使用
def trailing_diff(time_array,data_array,seconds):
trailing_list=[]
for i in xrange(len(time_array)):
now=time_array[i]
if now<seconds:
trailing_list.append(0)
else:
then=find_closest(time_array,now-seconds)
trailing_list.append(data_array[i]-data_array[then])
return np.asarray(trailing_list)
不幸的是,这不能很好地扩展,我希望能够在运行中计算(并绘制它)。
有关如何使其更加便利的任何想法?
编辑:输入/输出
In [48]:time1
Out[48]:
array([ 0.57200003, 0.579 , 0.58800006, 0.59500003,
0.5999999 , 1.05999994, 1.55900002, 2.00900006,
2.57599998, 3.05599999, 3.52399993, 4.00699997,
4.09599996, 4.57299995, 5.04699993, 5.52099991,
6.09299994, 6.55999994, 7.04099989, 7.50900006,
8.07500005, 8.55799985, 9.023 , 9.50699997,
9.59399986, 10.07200003, 10.54200006, 11.01999998,
11.58899999, 12.05699992, 12.53799987, 13.00499988,
13.57599998, 14.05599999, 14.52399993, 15.00199985,
15.09299994, 15.57599998, 16.04399991, 16.52199984,
17.08899999, 17.55799985, 18.03699994, 18.50499988,
19.0769999 , 19.5539999 , 20.023 , 20.50099993,
20.59099984, 21.07399988])
In [49]:weight1
Out[49]:
array([ 82.268, 82.268, 82.269, 82.272, 82.275, 82.291, 82.289,
82.288, 82.287, 82.287, 82.293, 82.303, 82.303, 82.314,
82.321, 82.333, 82.356, 82.368, 82.386, 82.398, 82.411,
82.417, 82.419, 82.424, 82.424, 82.437, 82.45 , 82.472,
82.498, 82.515, 82.541, 82.559, 82.584, 82.607, 82.617,
82.626, 82.626, 82.629, 82.63 , 82.636, 82.651, 82.663,
82.686, 82.703, 82.728, 82.755, 82.773, 82.8 , 82.8 ,
82.826])
In [50]:trailing_diff(time1,weight1,10)
Out[50]:
array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0.169, 0.182, 0.181, 0.209, 0.227, 0.254, 0.272,
0.291, 0.304, 0.303, 0.305, 0.305, 0.296, 0.274, 0.268,
0.265, 0.265, 0.275, 0.286, 0.309, 0.331, 0.336, 0.35 ,
0.35 , 0.354])
答案 0 :(得分:1)
使用现成的插值程序。如果你真的想要最近邻居的行为,我认为它必须是scipy的scipy.interpolate.interp1d
,但线性插值似乎是更好的选择,然后你可以使用numpy的numpy.interp
:
def trailing_diff(time, data, diff):
ret = np.zeros_like(data)
mask = (time - time[0]) >= diff
ret[mask] = data[mask] - np.interp(time[mask] - diff,
time, data)
return ret
time = np.arange(10) + np.random.rand(10)/2
weight = 82 + np.random.rand(10)
>>> time
array([ 0.05920317, 1.23000929, 2.36399981, 3.14701595, 4.05128494,
5.22100886, 6.07415922, 7.36161563, 8.37067107, 9.11371986])
>>> weight
array([ 82.14004969, 82.36214992, 82.25663272, 82.33764514,
82.52985723, 82.67820915, 82.43440796, 82.74038368,
82.84235675, 82.1333915 ])
>>> trailing_diff(time, weight, 3)
array([ 0. , 0. , 0. , 0.18093749, 0.20161107,
0.4082712 , 0.10430073, 0.17116831, 0.20691594, -0.31041841])
要获得最近邻居,你可以
from scipy.interpolate import interp1d
def trailing_diff(time, data, diff):
ret = np.zeros_like(data)
mask = (time - time[0]) >= diff
interpolator = interp1d(time, data, kind='nearest')
ret[mask] = data[mask] - interpolator(time[mask] - diff)
return ret