从输出音频语音到文本

时间:2021-06-30 12:30:35

标签: speech-recognition speech-to-text microsoft-cognitive naudio

理想情况

我想听从设备(笔记本电脑)发出的声音并实时转换为文本(无需保存到 wav 文件)。

现状

我能够使用 NAudio 捕获输出音频并将其保存到 wav 文件。

以下是将输出保存到 wavfile 的示例代码。此代码工作正常

string outputFileName = @"xxxxxx\recievers.wav";
            var capture = new WasapiLoopbackCapture();
            var writer = new WaveFileWriter(outputFileName, capture.WaveFormat);
            capture.DataAvailable += async (s, e) =>
            {
                if (writer != null)
                {
                    await writer.WriteAsync(e.Buffer, 0, e.BytesRecorded);
                    await writer.FlushAsync();
                }
            };
            capture.RecordingStopped += (s, e) =>
            {
                if (writer != null)
                {
                    writer.Dispose();
                    writer = null;
                }
                capture.Dispose();
            };
            capture.StartRecording();
            Console.WriteLine("Record Started, Press Any key to stop the record");
            Console.ReadLine();
            capture.StopRecording();

为了获得实时语音到文本我试图将缓冲区推送到 PushAudioStream 以跟随 sample code

public static async Task RecognitionWithPushAudioStreamAsync()
        {
            // Creates an instance of a speech config with specified subscription key and service region.
            // Replace with your own subscription key and service region (e.g., "westus").
            var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

            var stopRecognition = new TaskCompletionSource<int>();

            // Create a push stream
            using (var pushStream = AudioInputStream.CreatePushStream())
            {
                using (var audioInput = AudioConfig.FromStreamInput(pushStream))
                {
                    // Creates a speech recognizer using audio stream input.
                    using (var recognizer = new SpeechRecognizer(config, audioInput))
                    {
                        // Subscribes to events.
                        recognizer.Recognizing += (s, e) =>
                        {
                            Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
                        };

                        recognizer.Recognized += (s, e) =>
                        {
                            if (e.Result.Reason == ResultReason.RecognizedSpeech)
                            {
                                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                            }
                            else if (e.Result.Reason == ResultReason.NoMatch)
                            {
                                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                            }
                        };

                        recognizer.Canceled += (s, e) =>
                        {
                            Console.WriteLine($"CANCELED: Reason={e.Reason}");

                            if (e.Reason == CancellationReason.Error)
                            {
                                Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                                Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                                Console.WriteLine($"CANCELED: Did you update the subscription info?");
                            }

                            stopRecognition.TrySetResult(0);
                        };

                        recognizer.SessionStarted += (s, e) =>
                        {
                            Console.WriteLine("\nSession started event.");
                        };

                        recognizer.SessionStopped += (s, e) =>
                        {
                            Console.WriteLine("\nSession stopped event.");
                            Console.WriteLine("\nStop recognition.");
                            stopRecognition.TrySetResult(0);
                        };

                        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
                        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
                        var capture = new WasapiLoopbackCapture();
                        var writer = new WaveFileWriter(outputFileName, capture.WaveFormat);
                        capture.DataAvailable += async (s, e) =>
                        {
                            if (writer != null)
                            {
                                await writer.WriteAsync(e.Buffer, 0, e.BytesRecorded);
                                await writer.FlushAsync();
                                pushStream.Write(e.Buffer, e.BytesRecorded); // try to push buffer here
                            }
                        };
                        capture.RecordingStopped += (s, e) =>
                        {
                            if (writer != null)
                            {
                                writer.Dispose();
                                writer = null;
                            }
                            capture.Dispose();
                        };
                        capture.StartRecording();
                        Console.WriteLine("Record Started, Press Any key to stop the record");
                        Console.ReadLine();
                        capture.StopRecording();
                        pushStream.Close();

                        // Waits for completion.
                        // Use Task.WaitAny to keep the task rooted.
                        Task.WaitAny(new[] { stopRecognition.Task });

                        // Stops recognition.
                        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
                    }

简而言之,我正在打开一个推送流

using (var pushStream = AudioInputStream.CreatePushStream())

并尝试在流中推送缓冲区

pushStream.Write(e.Buffer, e.BytesRecorded);

没有从认知服务中识别出任何语音。

预先感谢您的帮助

0 个答案:

没有答案