Question

我正在寻找一种简单的方法来寻找＆＃34;高原＆＃34;或python列表中的组。作为输入我有这样的事情：

mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]

我想提取每个＆＃34;组＆＃34;的中间位置。在这种情况下，group被定义为！= 0的数据，例如至少3个位置。应忽略飞地上的单个零（如位置6）。

基本上我想获得以下输出：

myoutput = [8, 20]

对于我的用例，获得非常精确的输出数据并不重要。 [10,21]仍然可以。

总结一切：第一组：[0.143, 0.0, 0.22, 0.135, 0.44, 0.1];第二组：[0.33, 0.65, 0.22]。现在中间元素的位置（如果没有真正的中间值，则从中间向左或向右）。所以在输出中8将是第一组的中间，20是第二组的中间。

我已经尝试了一些方法。但它们并不像我想要的那样稳定（例如：更多的飞地可能会导致问题）。所以在投入更多时间在这个想法之前，我想问一下是否有更好的方法来实现这个功能。我甚至认为这可能是一个普遍的问题。可能已经有标准代码解决了吗？

有other questions描述了大致相同的问题，但我也需要＆＃34;平滑＆＃34;处理前的数据。

1。）平滑数据 - 摆脱飞地的零

import numpy as np
def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

y_smooth = smooth(mydata, 20)

2.）在平滑列表中找到起始点（如果值为！= 0且前面的值为0则应为起始点）。如果找到了一个端点：使用找到的最后一个起始点和当前端点获取该组的中间位置并将其写入双端队列。

laststart = 0
lastend = 0
myoutput = deque()

for i in range(1, len(y_smooth)-1):
        #detect start:
        if y_smooth[i]!=0 and y_smooth[i-1]==0:
            laststart = i   
        #detect end:
        elif y_smooth[i]!=0 and y_smooth[i+1]==0 and laststart+2 < i:
            lastend = i
            myoutput.appendleft(laststart+(lastend-laststart)/2)

编辑：为了简化一切，我在开头只提供了一个简短的输入数据示例。这个简短的列表实际上会导致一个有问题的平滑输出 - 整个列表将变得平滑并且不会留下零。 actual input data; actual input data after smoothing

Answer 1

如您所述，找到组的一种相当简单的方法是将数据转换为布尔数组，其中包含组内数据，0表示组外数据，并计算两个连续值的差值，这样您就可以1表示组的开头，-1表示结束。

以下是一个例子：

import numpy as np

mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]
arr = np.array(mydata)

mask = (arr!=0).astype(np.int) #array that contains 1 for every non zero value, zero other wise
padded_mask =  np.pad(mask,(1,),"constant") #add a zero at the start and at the end to handle edge cases
edge_mask = padded_mask[1:] - padded_mask[:-1] #diff between a value and the following one 
#if there's a 1 in edge mask it's a group start
#if there's a -1 it's a group stop

#where gives us the index of those starts and stops
starts = np.where(edge_mask == 1)[0]
stops = np.where(edge_mask == -1)[0]
print(starts,stops)

#we format groups and drop groups that are too small
groups = [group for group in zip(starts,stops) if (group[0]+2 < group[1])]


for group in groups:
        print("start,stop : {}  middle : {}".format(group,(group[0]+group[1])/2) )

输出：

[ 5  7 19] [ 6 11 22]
start,stop : (7, 11)  middle : 9.0
start,stop : (19, 22)  middle : 20.5

Answer 2

您的平滑数据没有剩下零：

import numpy as np

def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    print(box)
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

mydata = [0.0, 0.0, 0.0, 0.0,-0.2, 0.143, 
          0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 
          0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
          0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 
          0.0, 0.0]

y_smooth = smooth(mydata, 27)
print(y_smooth)

输出：

[ 0.0469   0.0519   0.0519   0.0519   0.0519   0.0519   
  0.0519   0.0519  0.0519   0.0519   0.0684   0.1009   
  0.1119   0.1119   0.1119   0.1119  0.10475  0.10475  
  0.09375  0.087    0.065    0.06     0.06     0.06     
  0.06   0.06     0.06   ]

在原始数据中找到它的方法是：

def findGroups(data, minGrpSize=1):
  startpos = -1
  endpos = -1
  pospos = []
  for idx,v in enumerate(mydata):
    if v > 0 and startpos == -1:
      startpos = idx
    elif v == 0.0:
      if startpos > -1:
       if idx < (len(mydata)-1) and mydata[idx+1] != 0.0:
         pass # ignore one 0.0 in a run
       else:
         endpos = idx

      if startpos > -1:
        if endpos >-1 or idx == len(mydata)-1: # both set or last one 
          if (endpos - startpos) >= minGrpSize:
              pospos.append((startpos,endpos))
          startpos = -1
          endpos = -1
  return pospos

pos = findGroups(mydata,1)
print(*map(lambda x: sum(x) // len(x), pos))

pos = findGroups(mydata,3)
print(*map(lambda x: sum(x) // len(x), pos))

pos = findGroups(mydata,5)
print(*map(lambda x: sum(x) // len(x), pos))

输出：

8 20
8 20
8

Answer 3

第2部分 - 找到群组中点：

mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
          0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]

groups = []
last_start = 0
last_end = 0
in_group = 0

for i in range(1, len(mydata) - 1):
    if not in_group:
        if mydata[i] and not mydata[i - 1]:
            last_start = i
            in_group = 1
    else:  # a group continued.
        if mydata[i]:
            last_end = i
        elif last_end - last_start > 1:  # we have a group i.e. not single non-zero value
            mid_point = (last_end - last_start) + last_start
            groups.append(((last_end - last_start)//2) + last_start)
            last_start, last_end, in_group = (0, 0, 0)
        else:  # it was just a single non-zero.
            last_start, last_end, in_group = (0, 0, 0)

print(groups)

输出：

[8, 20]

Answer 4

完全numpy解决方案将是这样的:(未完全优化）

import numpy as np

input_data = np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.143,
                       0.0, 0.22, 0.135, 0.44, 0.1, 0.0,
                       0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
                       0.33, 0.65, 0.22, 0.0, 0.0, 0.0,
                       0.0, 0.0])

# Find transitions between zero and nonzero
non_zeros = input_data > 0
changes = np.ediff1d(non_zeros, to_begin=not non_zeros[0],
                     to_end=not non_zeros[-1])
change_idxs = np.nonzero(changes)[0]

# Filter out small holes
holes = change_idxs.reshape(change_idxs.size//2, 2)    
hole_sizes = holes[:, 1]-holes[:, 0]
big_holes = holes[hole_sizes > 1]

kept_change_idxs = np.r_[0, big_holes.flatten(), input_data.size]

# Get midpoints of big intervals
intervals = kept_change_idxs.reshape(kept_change_idxs.size//2, 2)
big_intervals = intervals[intervals[:, 1]-intervals[:, 0] >= 3]
print((big_intervals[:, 0]+big_intervals[:, 1])//2)

在列表中查找值为！= 0的值组

4 个答案: