Question

我正在尝试编写一个脚本，它会监听我的输入并在感知某人是音量时打印“ON”，并在感知到该人已经停止说话时打印“OFF”。

以下是我到目前为止所做的部分工作：

import collections
import audioop
import pyaudio
import time
import math

CHUNK = 1024 # The size of the chunk to read from the mic stream
FORMAT = pyaudio.paInt24 # The format depends on the mic used
CHANNELS = 2 # The number of channels used to record the audio. Depends on the mic
RATE = 44100 # The sample rate for audio. Depends on the mic
THRESHOLD = 6000 # The threshold intensity that defines silence. 
                 # an int lower than THRESHOLD is considered silence 
RECORD_SECONDS=5

def test():
    p = pyaudio.PyAudio()

    stream = p.open(format=FORMAT, channels=CHANNELS,rate=RATE,input=True, frames_per_buffer=CHUNK)

    q = collections.deque(maxlen=RATE/CHUNK)

    flag = True;
    print("--Listening--")  

    for i in range(0,int(RATE/CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        q.append(abs(audioop.avg(data,4)))


    print(sum(q)/(RATE/CHUNK*RECORD_SECONDS))

    stream.stop_stream()
    stream.close()
    p.terminate()

def listen():
    p = pyaudio.PyAudio()

    stream = p.open(format=FORMAT, channels=CHANNELS,rate=RATE,input=True, frames_per_buffer=CHUNK)

    q = collections.deque(maxlen=RATE/CHUNK)

    flag = True;
    print("--Listening--")  

    while(True):
        data = stream.read(CHUNK)
        q.append(abs(audioop.avg(data,4)))
        if(flag==True):
            if (sum(q)/(RATE/CHUNK*RECORD_SECONDS)<3500000):
                print("OFF")
                flag=False
        else:
            if(sum(q)/(RATE/CHUNK*RECORD_SECONDS)>3500000):
                print("ON")
                flag=True


if(__name__== '__main__'):
    test()

listen（）方法是'ON''OFF'方法，而test（）用于检查音频电平。

我不完全确定audioop是正确的方法。在使用'test'方法播放几分钟后，它似乎与音量水平不一致。我可以从中获得非常高的价值（8,000,000）用于窃窃私语，同时获得4,000,000用于常规谈话，3,000,000用于不谈话（沉默）。

有没有办法让它与音频水平保持一致？所以我会得到一定的沉默范围，一个较高的一个用于窃窃私语，一个较高的一个用于说话等（即它会保持一致）？

如何检测和分析音频电平

0 个答案: