Question

我将一些音频数据加载到一个numpy数组中，我希望通过查找静音部分来分割数据，即在一段时间内音频幅度低于某个阈值的部分。

这样做非常简单：

values = ''.join(("1" if (abs(x) < SILENCE_THRESHOLD) else "0" for x in samples))
pattern = re.compile('1{%d,}'%int(MIN_SILENCE))                                                                           
for match in pattern.finditer(values):
   # code goes here

上面的代码找到了至少MIN_SILENCE个连续元素小于SILENCE_THRESHOLD的部分。

现在，显然，上面的代码非常低效，并且滥用正则表达式。是否有其他方法更有效，但仍然会产生同样简单和短的代码？

Answer 1

这是一个基于numpy的解决方案。

我认为（？）它应该比其他选项更快。希望它相当清楚。

然而，它确实需要两倍于各种基于发生器的解决方案的内存。只要您可以在内存中保存数据的单个临时副本（对于diff），以及与数据长度相同的布尔数组（每个元素1位），它应该非常高效...

import numpy as np

def main():
    # Generate some random data
    x = np.cumsum(np.random.random(1000) - 0.5)
    condition = np.abs(x) < 1

    # Print the start and stop indicies of each region where the absolute 
    # values of x are below 1, and the min and max of each of these regions
    for start, stop in contiguous_regions(condition):
        segment = x[start:stop]
        print start, stop
        print segment.min(), segment.max()

def contiguous_regions(condition):
    """Finds contiguous True regions of the boolean array "condition". Returns
    a 2D array where the first column is the start index of the region and the
    second column is the end index."""

    # Find the indicies of changes in "condition"
    d = np.diff(condition)
    idx, = d.nonzero() 

    # We need to start things after the change in "condition". Therefore, 
    # we'll shift the index by 1 to the right.
    idx += 1

    if condition[0]:
        # If the start of condition is True prepend a 0
        idx = np.r_[0, idx]

    if condition[-1]:
        # If the end of condition is True, append the length of the array
        idx = np.r_[idx, condition.size] # Edit

    # Reshape the result into two columns
    idx.shape = (-1,2)
    return idx

main()

Answer 2

略显邋,,但简单快捷，如果你不介意使用scipy：

from scipy.ndimage import gaussian_filter
sigma = 3
threshold = 1
above_threshold = gaussian_filter(data, sigma=sigma) > threshold

这个想法是数据的安静部分将平滑到低幅度，而响亮的区域则不会。调整'sigma'以影响'安静'区域必须有多长;调整'阈值'以影响它必须是多么安静。对于大sigma，这会减慢，此时使用基于FFT的平滑可能会更快。

这样做的另一个好处是，单个“热像素”不会破坏您的静音发现，因此您对某些类型的噪音不太敏感。

Answer 3

使用scipy.ndimage有一个非常方便的解决方案。对于数组：

a = array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0])

可以是应用于另一个数组的条件的结果，找到连续的区域就像这样简单：

regions = scipy.ndimage.find_objects(scipy.ndimage.label(a)[0])

然后，可以对这些区域应用任何功能，例如，像：

[np.sum(a[r]) for r in regions]

Answer 4

我还没有测试过这个，但你应该接近你想要的。稍微多一点代码，但应该更高效，可读，并且它不会滥用正则表达式： - ）

def find_silent(samples):
    num_silent = 0
    start = 0
    for index in range(0, len(samples)):
        if abs(samples[index]) < SILENCE_THRESHOLD:
            if num_silent == 0:
                start = index
            num_silent += 1
        else:
            if num_silent > MIN_SILENCE:
                yield samples[start:index]
            num_silent = 0
    if num_silent > MIN_SILENCE:
        yield samples[start:]

for match in find_silent(samples):
    # code goes here

Answer 5

这应返回(start,length)对的列表：

def silent_segs(samples,threshold,min_dur):
  start = -1
  silent_segments = []
  for idx,x in enumerate(samples):
    if start < 0 and abs(x) < threshold:
      start = idx
    elif start >= 0 and abs(x) >= threshold:
      dur = idx-start
      if dur >= min_dur:
        silent_segments.append((start,dur))
      start = -1
  return silent_segments

一个简单的测试：

>>> s = [-1,0,0,0,-1,10,-10,1,2,1,0,0,0,-1,-10]
>>> silent_segs(s,2,2)
[(0, 5), (9, 5)]

Answer 6

另一种快速简洁地做到这一点的方法：

import pylab as pl

v=[0,0,1,1,0,0,1,1,1,1,1,0,1,0,1,1,0,0,0,0,0,1,0,0]
vd = pl.diff(v)
#vd[i]==1 for 0->1 crossing; vd[i]==-1 for 1->0 crossing
#need to add +1 to indexes as pl.diff shifts to left by 1

i1=pl.array([i for i in xrange(len(vd)) if vd[i]==1])+1
i2=pl.array([i for i in xrange(len(vd)) if vd[i]==-1])+1

#corner cases for the first and the last element
if v[0]==1:
  i1=pl.hstack((0,i1))
if v[-1]==1:
  i2=pl.hstack((i2,len(v)))

现在i1包含起始索引，i2包含1，...，1个区域的结束索引

Answer 7

@ joe-kington我使用np.diff / np.nonzero代替了argmax解决方案，速度提高了约20％-25％（参见下面的代码，condition是boolean）< / p>

def contiguous_regions(condition):
    idx = []
    i = 0
    while i < len(condition):
        x1 = i + condition[i:].argmax()
        try:
            x2 = x1 + condition[x1:].argmin()
        except:
            x2 = x1 + 1
        if x1 == x2:
            if condition[x1] == True:
                x2 = len(condition)
            else:
                break
        idx.append( [x1,x2] )
        i = x2
    return idx

当然，您的里程可能会因您的数据而异。

此外，我并不完全确定，但我认为numpy可以优化argmin/argmax而不是布尔数组，以便在第一次True/False次发生时停止搜索。这可能解释了它。

Answer 8

我知道我要参加聚会晚了，但是另一种方法是使用1d卷积：

np.convolve(sig > threshold, np.ones((cons_samples)), 'same') == cons_samples

cons_samples是您需要超过阈值的连续采样数

在numpy数组中查找满足条件的大量连续值

8 个答案: