我有一个类型为
的数据流list = [22 , 15 , 6 ,12 ,30 , 45, 200 , 238 , 220 , 6000, 6250 , 6900, 6700,6500, 0 , 250 , 6000 ,6800,220, 250,200]
如您所见,数据流在中间具有一些非常高的值,这是我们希望提取的值。该块还可以包含较低的值(在上面的列表中为0和250),但在块的开头和结尾处肯定包含较高的值(6000,6500)。我们如何提取这个特定的数据窗口。
输出应为
new_list = [6000,6250,6900,600,6500,0, 250 , 6000 , 6800]
我通常在MATLAB上工作。所以我要做的就是找到数据中的第一个和最后一个峰值
答案 0 :(得分:0)
以下是您的数据图:
完成您正在谈论的事情的传统方法是对数据进行插值。这意味着添加最适合的曲线,例如 三次样条线 。
找到一条穿过这些点的可微曲线后,您会找到一阶导数(变化率)。一阶导数具有局部最大值的位置是您的数据突然上坡或下坡的位置。
但是,出于您的目的,所有这些可能都不过分。我认为以下代码将满足您的要求:
class OnOffRecorder:
"""
"""
def __init__(self, is_active = False):
"""
By default, is not recording data
"""
self.is_active = is_active
self.data = dict()
def turn_recording_on(self):
self.is_active = True
def turn_recording_off(self):
self.is_active = False
def toggle_recording(self):
self.is_active = not(self.is_active)
def push(self, key, value):
if self.is_active:
self.data[key] = value
return
def pop(self):
old_data = self.data
self.data = dict()
return old_data
def get_peak_data(data, delta):
"""
`delta` represents percentage distance between
minimum and maximum of the data.
if data suddenly increases by delta, then begin recording
if data suddenly decreases by delta, then stop recording
"""
mind = min(data)
maxd = max(data)
raynge = maxd - mind
# `li` == `left index`
# `ri` == `right index`
record = OnOffRecorder()
record.turn_recording_off()
for li in range(0, -1 + len(data)):
ri = li + 1
ld = data[li] # left data
rd = data[ri] # right data
if abs(rd-ld)/raynge > delta:
if rd > ld :
record.turn_recording_on()
elif rd < ld:
record.turn_recording_off()
record.push(ri, rd)
return record.pop()
data = [22 , 15 , 6 ,12 ,30 , 45, 200 , 238 ,
220 , 6000, 6250 , 6900, 6700, 6500,
0 , 250 , 6000 ,6800,220, 250, 200]
# if data suddenly increases, then begin recording that data
# if data suddenly decreases, then stop recording that data.
delta = .25
peak_data = get_peak_data(data, delta)
print(peak_data.values())
我开发了第二种/另一种方法,它不需要您手动指定delta
值。如下所述:
下面是实现上述逐步过程的代码:
def get_dividing_line(data):
sdata = sorted(data)
jumps = [sdata[i+1] - sdata[i] for i in range(0, -1 + len(sdata))]
jumps_max = max(jumps)
jumps_max_left = jumps.index(jumps_max)
jumps_max_right = [x for x in reversed(jumps)].index(jumps_max)
jumps_max_right = len(sdata) - jumps_max_right
jump_start = sdata[jumps_max_left]
jump_end = sdata[jumps_max_right]
return (jump_start + jump_end)/2
def get_high_signals(data):
threshold = get_dividing_line(data)
return [x for x in data if x >= threshold]
data = [22 , 15 , 6 ,12 ,30 , 45, 200 , 238 ,
220 , 6000, 6250 , 6900, 6700, 6500,
0 , 250 , 6000 ,6800,220, 250, 200]
high_signals = get_high_signals(data)
print(high_signals)
# prints [6000, 6250, 6900, 6700, 6500, 6000, 6800]
答案 1 :(得分:0)
您可以在数据的最高值和最低值之间建立一个阈值,然后查找超出该阈值的条目的第一个和最后一个索引以在列表中形成一个范围:
data = [22 , 15 , 6 ,12 ,30 , 45, 200 , 238 , 220 , 6000, 6250 , 6900, 6700,6500, 0 , 250 , 6000 ,6800,220, 250,200]
treshold = (min(data)+max(data))/2
start = next(i for i,v in enumerate(data) if v >= treshold)
end = len(data) - next(i for i,v in enumerate(data[::-1]) if v >= treshold)
result = data[start:end]
print(result)
# [6000, 6250, 6900, 6700, 6500, 0, 250, 6000, 6800]