Microsoft.CognitiveServices.Speech和长文件中CreateSpeechRecognizerWithFileInput的用法

时间:2018-06-21 13:52:04

标签: c# azure speech-recognition microsoft-cognitive

在语音示例应用程序中,有一个CreateSpeechRecognizerWithFileInput的示例,但是在第一次发声后返回。我确实注意到您可以多次调用RecognizeAsync,但是它有一些奇怪的行为:

  1. 我在文件中间收到RecognitionErrorRaised,并显示“ NoMatch”错误。
  2. 如果文件中存在一段时间的静默期,则
  3. FinalResultsReceived将被触发,结果为空。
  4. 似乎没有一致/可追踪的EOF事件来完成识别。

如果我想转录一个20分钟的音频文件,那么使用统一语音SDK是否有更好的方法?在旧的牛津包下,相同的文件也可以。理想情况下,我希望能够获得发声和转录的时间偏移。

2 个答案:

答案 0 :(得分:0)

您可以将StartContinuousRecognitionAsync ();StopContinuousRecognitionAsync ();与SDK一起使用,以开始和停止识别。

这里是sample

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

namespace MicrosoftSpeechSDKSamples
{
    public class SpeechRecognitionSamples
    {
        // Speech recognition from microphone.
        public static async Task RecognitionWithMicrophoneAsync()
        {
            // <recognitionWithMicrophone>
            // Creates an instance of a speech factory with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
            var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");

            // Creates a speech recognizer using microphone as audio input. The default language is "en-us".
            using (var recognizer = factory.CreateSpeechRecognizer())
            {
                // Starts recognizing.
                Console.WriteLine("Say something...");

                // Starts recognition. It returns when the first utterance has been recognized.
                var result = await recognizer.RecognizeAsync().ConfigureAwait(false);

                // Checks result.
                if (result.RecognitionStatus != RecognitionStatus.Recognized)
                {
                    Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
                }
                else
                {
                    Console.WriteLine($"We recognized: {result.RecognizedText}");
                }
            }
            // </recognitionWithMicrophone>
        }

        // Speech recognition in the specified spoken language.
        public static async Task RecognitionWithLanguageAsync()
        {
            // <recognitionWithLanguage>
            // Creates an instance of a speech factory with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
            var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");

            // Creates a speech recognizer for the specified language, using microphone as audio input.
            var lang = "en-us";
            using (var recognizer = factory.CreateSpeechRecognizer(lang))
            {
                // Starts recognizing.
                Console.WriteLine($"Say something in {lang} ...");

                // Starts recognition. It returns when the first utterance has been recognized.
                var result = await recognizer.RecognizeAsync().ConfigureAwait(false);

                // Checks result.
                if (result.RecognitionStatus != RecognitionStatus.Recognized)
                {
                    Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
                }
                else
                {
                    Console.WriteLine($"We recognized: {result.RecognizedText}");
                }
            }
            // </recognitionWithLanguage>
        }
        // Speech recognition from file.
        public static async Task RecognitionWithFileAsync()
        {
            // <recognitionFromFile>
            // Creates an instance of a speech factory with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
            var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");

            // Creates a speech recognizer using file as audio input.
            // Replace with your own audio file name.
            using (var recognizer = factory.CreateSpeechRecognizerWithFileInput(@"YourAudioFile.wav"))
            {
                // Starts recognition. It returns when the first utterance is recognized.
                var result = await recognizer.RecognizeAsync().ConfigureAwait(false);

                // Checks result.
                if (result.RecognitionStatus != RecognitionStatus.Recognized)
                {
                    Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
                }
                else
                {
                    Console.WriteLine($"We recognized: {result.RecognizedText}");
                }
            }
            // </recognitionFromFile>
        }

        // <recognitionCustomized>
        // Speech recognition using a customized model.
        public static async Task RecognitionUsingCustomizedModelAsync()
        {
            // Creates an instance of a speech factory with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
            var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");

            // Creates a speech recognizer using microphone as audio input.
            using (var recognizer = factory.CreateSpeechRecognizer())
            {
                // Replace with the CRIS deployment id of your customized model.
                recognizer.DeploymentId = "YourDeploymentId";

                Console.WriteLine("Say something...");

                // Starts recognition. It returns when the first utterance has been recognized.
                var result = await recognizer.RecognizeAsync().ConfigureAwait(false);

                // Checks results.
                if (result.RecognitionStatus != RecognitionStatus.Recognized)
                {
                    Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
                }
                else
                {
                    Console.WriteLine($"We recognized: {result.RecognizedText}");
                }
            }
        }
        // </recognitionCustomized>

        // <recognitionContinuous>
        // Speech recognition with events
        public static async Task ContinuousRecognitionAsync()
        {
            // Creates an instance of a speech factory with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
            var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");

            // Creates a speech recognizer using microphone as audio input.
            using (var recognizer = factory.CreateSpeechRecognizer())
            {
                // Subscribes to events.
                recognizer.IntermediateResultReceived += (s, e) => {
                    Console.WriteLine($"\n    Partial result: {e.Result.RecognizedText}.");
                };

                recognizer.FinalResultReceived += (s, e) => {
                    if (e.Result.RecognitionStatus == RecognitionStatus.Recognized)
                    {
                        Console.WriteLine($"\n    Final result: Status: {e.Result.RecognitionStatus.ToString()}, Text: {e.Result.RecognizedText}.");
                    }
                    else
                    {
                        Console.WriteLine($"\n    Final result: Status: {e.Result.RecognitionStatus.ToString()}, FailureReason: {e.Result.RecognitionFailureReason}.");
                    }
                };

                recognizer.RecognitionErrorRaised += (s, e) => {
                    Console.WriteLine($"\n    An error occurred. Status: {e.Status.ToString()}, FailureReason: {e.FailureReason}");
                };

                recognizer.OnSessionEvent += (s, e) => {
                    Console.WriteLine($"\n    Session event. Event: {e.EventType.ToString()}.");
                };

                // Starts continuos recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
                Console.WriteLine("Say something...");
                await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

                Console.WriteLine("Press any key to stop");
                Console.ReadKey();

                await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
            }
        }
        // </recognitionContinuous>
    }
}

如果您有大量音频,最好使用Batch Transcription

答案 1 :(得分:0)

正如阿里所说,如果您想识别多个话语,则StartContinuousRecognitionAsync()和StopContinousRecognitionAsync()是正确的方法。

{SDK}提供了最新的SpeechSDK示例,https://github.com/Azure-Samples/cognitive-services-speech-sdk包括在不同平台(当前Windows / Linux和更多平台)上用于不同语言(当前为C ++ / C#,如果支持新语言,则将添加示例)。被添加)。

关于问题3),SessionStopped事件用于检测EOF。您可以在这里找到示例:https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/Windows/csharp_samples/speech_recognition_samples.cs#L194

谢谢