我正在尝试编写一个脚本,它会监听我的输入并在感知某人是音量时打印“ON”,并在感知到该人已经停止说话时打印“OFF”。
以下是我到目前为止所做的部分工作:
import collections
import audioop
import pyaudio
import time
import math
CHUNK = 1024 # The size of the chunk to read from the mic stream
FORMAT = pyaudio.paInt24 # The format depends on the mic used
CHANNELS = 2 # The number of channels used to record the audio. Depends on the mic
RATE = 44100 # The sample rate for audio. Depends on the mic
THRESHOLD = 6000 # The threshold intensity that defines silence.
# an int lower than THRESHOLD is considered silence
RECORD_SECONDS=5
def test():
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT, channels=CHANNELS,rate=RATE,input=True, frames_per_buffer=CHUNK)
q = collections.deque(maxlen=RATE/CHUNK)
flag = True;
print("--Listening--")
for i in range(0,int(RATE/CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
q.append(abs(audioop.avg(data,4)))
print(sum(q)/(RATE/CHUNK*RECORD_SECONDS))
stream.stop_stream()
stream.close()
p.terminate()
def listen():
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT, channels=CHANNELS,rate=RATE,input=True, frames_per_buffer=CHUNK)
q = collections.deque(maxlen=RATE/CHUNK)
flag = True;
print("--Listening--")
while(True):
data = stream.read(CHUNK)
q.append(abs(audioop.avg(data,4)))
if(flag==True):
if (sum(q)/(RATE/CHUNK*RECORD_SECONDS)<3500000):
print("OFF")
flag=False
else:
if(sum(q)/(RATE/CHUNK*RECORD_SECONDS)>3500000):
print("ON")
flag=True
if(__name__== '__main__'):
test()
listen()方法是'ON''OFF'方法,而test()用于检查音频电平。
我不完全确定audioop是正确的方法。在使用'test'方法播放几分钟后,它似乎与音量水平不一致。 我可以从中获得非常高的价值(8,000,000)用于窃窃私语,同时获得4,000,000用于常规谈话,3,000,000用于不谈话(沉默)。
有没有办法让它与音频水平保持一致? 所以我会得到一定的沉默范围,一个较高的一个用于窃窃私语,一个较高的一个用于说话等(即它会保持一致)?