(这是在Python中,代码会很棒,但我主要对算法感兴趣。)
我正在监听音频流(PyAudio)并寻找一系列5个弹出窗口(请参见底部的可视化)。我正在阅读()获取流并获取我刚读过的块的RMS值(类似于this question)。我的问题是我不是在寻找单个事件,而是一系列具有某些特性但不像我想要的布尔值的事件(pop)。检测这五种流行音乐最直接(和高效)的方式是什么?
RMS函数给我一个这样的流:
0.000580998485254, 0.00045098391298, 0.00751436443973, 0.002733730043, 0.00160775708652, 0.000847808804511
如果我为你舍入(类似的流),它看起来会更有用:
0.001, 0.001, 0.018, 0.007, 0.003, 0.001, 0.001
你可以看到第3项中的弹出,大概是因为它在第4项中安静下来,也许尾部是在第5项的一小部分中。
我想连续检测其中的5个。
我天真的做法是: a)定义pop是什么:Block的RMS超过.002。至少2个区块但不超过4个区块。从沉默开始,以沉默结束。
另外,我很想定义什么是静音(忽略不太响的但不是很安静的块,但是我不确定这更有意义,然后考虑'pop'是布尔值)。
b)然后有一个状态机跟踪一堆变量,并有一堆if语句。像:
while True:
is_pop = isRMSAmplitudeLoudEnoughToBeAPop(stream.read())
if is_pop:
if state == 'pop':
#continuation of a pop (or maybe this continuation means
#that it's too long to be a pop
if num_pop_blocks <= MAX_POP_RECORDS:
num_pop_blocks += 1
else:
# too long to be a pop
state = 'waiting'
num_sequential_pops = 0
else if state == 'silence':
#possible beginning of a pop
state = 'pop'
num_pop_blocks += 1
num_silence_blocks = 0
else:
#silence
if state = 'pop':
#we just transitioned from pop to silence
num_sequential_pops += 1
if num_sequential_pops == 5:
# we did it
state = 'waiting'
num_sequential_pops = 0
num_silence_blocks = 0
fivePopsCallback()
else if state = 'silence':
if num_silence_blocks >= MAX_SILENCE_BLOCKS:
#now we're just waiting
state = 'waiting'
num_silence_blocks = 0
num_sequential_pops = 0
该代码根本不完整(可能有一两个错误),但说明了我的思路。它肯定比我想要的更复杂,这就是我要求建议的原因。
答案 0 :(得分:1)
您可能想要计算最后P点的simple moving average,其中P~ = 4并将结果与原始输入数据一起绘制。
然后,您可以将平滑平均值的最大值用作弹出窗口。定义一个最大间隔,在该间隔中可以看到五个弹出窗口,这可能是你的后续内容。
调整P以获得最佳效果。
如果没有Python模块,我不会感到惊讶,但我还没看过。
答案 1 :(得分:1)
对我而言,我最终得到的是一种天真的方法,它有一个持续的循环和一些变量来维持和过渡到新的状态。但是,在完成之后,我发现我应该探索热门词检测,因为连续5次点击基本上是一个热门词。他们有一个我必须寻找的模式。
无论如何,这是我的代码:
POP_MIN_MS = 50
POP_MAX_MS = 150
POP_GAP_MIN_MS = 50
POP_GAP_MAX_MS = 200
POP_BORDER_MIN_MS = 500
assert POP_BORDER_MIN_MS > POP_GAP_MAX_MS
POP_RMS_THRESHOLD_MIN = 100
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100 # Sampling Rate -- frames per second
INPUT_BLOCK_TIME_MS = 50
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME_MS/1000)
POP_MIN_BLOCKS = POP_MIN_MS / INPUT_BLOCK_TIME_MS
POP_MAX_BLOCKS = POP_MAX_MS / INPUT_BLOCK_TIME_MS
POP_GAP_MIN_BLOCKS = POP_GAP_MIN_MS / INPUT_BLOCK_TIME_MS
POP_GAP_MAX_BLOCKS = POP_GAP_MAX_MS / INPUT_BLOCK_TIME_MS
POP_BORDER_MIN_BLOCKS = POP_BORDER_MIN_MS / INPUT_BLOCK_TIME_MS
def listen(self):
pops = 0
sequential_loud_blocks = 0
sequential_notloud_blocks = 0
stream = self.pa.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=INPUT_FRAMES_PER_BLOCK
)
states = {
'PENDING': 1,
'POPPING': 2,
'ENDING': 3,
}
state = states['PENDING']
while True:
amp = audioop.rms(stream.read(INPUT_FRAMES_PER_BLOCK), 2)
is_loud = (amp >= POP_RMS_THRESHOLD_MIN)
if state == states['PENDING']:
if is_loud:
# Only switch to POPPING if it's been quiet for at least the border
# period. Otherwise stay in PENDING.
if sequential_notloud_blocks >= POP_BORDER_MIN_BLOCKS:
state = states['POPPING']
sequential_loud_blocks = 1
# If it's now loud then reset the # of notloud blocks
sequential_notloud_blocks = 0
else:
sequential_notloud_blocks += 1
elif state == states['POPPING']:
if is_loud:
sequential_loud_blocks += 1
# TODO: Is this necessary?
sequential_notloud_blocks = 0
if sequential_loud_blocks > POP_MAX_BLOCKS:
# it's been loud for too long; this isn't a pop
state = states['PENDING']
pops = 0
#print "loud too long"
# since it has been loud and remains loud then no reason to reset
# the notloud_blocks count
else:
# not loud
if sequential_loud_blocks:
# just transitioned from loud. was that a pop?
# we know it wasn't too long, or we would have transitioned to
# PENDING during the pop
if sequential_loud_blocks < POP_MIN_BLOCKS:
# wasn't long enough
# go to PENDING
state = states['PENDING']
pops = 0
#print "not loud long enough"
else:
# just right
pops += 1
logging.debug("POP #%s", pops)
sequential_loud_blocks = 0
sequential_notloud_blocks += 1
else:
# it has been quiet. and it's still quiet
sequential_notloud_blocks += 1
if sequential_notloud_blocks > POP_GAP_MAX_BLOCKS:
# it was quiet for too long
# we're no longer popping, but we don't know if this is the
# border at the end
state = states['ENDING']
elif state == states['ENDING']:
if is_loud:
# a loud block before the required border gap. reset
# since there wasn't a gap, this couldn't be a valid pop anyways
# so just go back to PENDING and let it monitor for the border
sequential_loud_blocks = 1
sequential_notloud_blocks = 0
pops = 0
state = states['PENDING']
else:
sequential_notloud_blocks += 1
# Is the border time (500 ms right now) enough of a delay?
if sequential_notloud_blocks >= POP_BORDER_MIN_BLOCKS:
# that's a bingo!
if pops == 5:
stream.stop_stream()
# assume that starting now the channel is not silent
start_time = time.time()
print ">>>>> 5 POPS"
elapsed = time.time() - start_time
#time.time() may return fractions of a second, which is ideal
stream.start_stream()
# do whateve we need to do
state = states['PENDING']
pops = 0
需要一些正式的测试。我昨晚发现了一个问题,即在弹出一段时间之后它没有重置自己,然后是太长时间的安静。我的计划是重构,然后给它一个模拟RMS流(例如,(0,0,0,500,200,0,200,0,...)),并确保它检测(或不检测) )适当的。