我想使用Naudio从麦克风获取音频并将音频数据发送到API以进行语音识别。
如下面的源代码所示,语音数据是通过websocket发送的。
当前的程序设计为在开始使用Task.Delay
记录后4秒钟停止记录。
我想从麦克风的输入状态检测语音结束,即所谓的VAD(语音活动检测),并将结束命令recog-stop
发送到API。有什么样的方法?
例如,是否可以使用Naudio来检测麦克风的音量并在麦克风的输入增益在特定时间段内低于某个特定值时停止录音?
csharp
private async Task Stream_SendVoice(ClientWebSocket ws)
{
ArraySegment<byte> closingMessage = new ArraySegment<byte>(Encoding.UTF8.GetBytes(
"{\"command\": \"recog-stop\"}"
));
// Read from the microphone and stream to API.
object writeLock = new object();
bool writeMore = true;
var waveIn = new NAudio.Wave.WaveInEvent();
waveIn.DeviceNumber = 0;
waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
waveIn.DataAvailable +=
(object sender, NAudio.Wave.WaveInEventArgs args) =>
{
lock (writeLock)
{
if (!writeMore) return;
ws.SendAsync(new ArraySegment<byte>(args.Buffer), WebSocketMessageType.Binary, true, CancellationToken.None);
}
};
waveIn.StartRecording();
this.textBox2.AppendText("Speak now.");
await Task.Delay(TimeSpan.FromSeconds(4));
await ws.SendAsync(closingMessage, WebSocketMessageType.Text, true, CancellationToken.None);
// Stop recording and shut down.
waveIn.StopRecording();
lock (writeLock) writeMore = false;
}