我每100毫秒进行一次测量。我想通过每10秒或至少选择最接近的值来减少数据。
我在这里做一个小系列,例如10s。 我使用循环,但是我想找到一种简便的方法来避免这种情况。
建议?
import pandas as pd
import numpy as np
data = pd.Series([0, 1, 2, 8,11,12,26,27,28,31,40,49])
time_span = 10
delta_time = 3
time_10s = np.arange(0,int((max(data)//10)*10)+1,10)
index_list = []
for elt in time_10s:
min_index = abs(data-elt).idxmin()
min_value = abs(data-elt).min()
if min_value < delta_time:
index_list.append(abs(data-elt).idxmin())
print(data[index_list])
我也尝试了一些模数运算,但是却什么也没给出:
A = data % time_span < delta_time
B = data % time_span > (time_span - delta_time)
C = A | B
D = data[C == True].index.values
谢谢
答案 0 :(得分:1)
我们可以使用np.searchsorted
-
# Get array data for better performance
a = data.to_numpy(copy=False) # data.values on older pandas versions
# Use searchsorted to get right-side closest indices for each of bins
idx0 = np.searchsorted(a,time_10s,'right')
# Get right and left side differences for each of the bins
v1 = time_10s-a[(idx0-1).clip(min=0)]
v2 = a[idx0]-time_10s
# Compare those to see which ones from the left ones are closer
# and thus adjust the indices idx0 accordingly by 1
idx1 = idx0-(v1<v2)
# Use those indices to get the indexed data and keep the valid ones
# based on the threshold delta_time
data_f = data[idx1]
out = data_f[np.abs(data_f-time_10s)<delta_time]