在语音示例应用程序中,有一个CreateSpeechRecognizerWithFileInput的示例,但是在第一次发声后返回。我确实注意到您可以多次调用RecognizeAsync,但是它有一些奇怪的行为:
如果我想转录一个20分钟的音频文件,那么使用统一语音SDK是否有更好的方法?在旧的牛津包下,相同的文件也可以。理想情况下,我希望能够获得发声和转录的时间偏移。
答案 0 :(得分:0)
您可以将StartContinuousRecognitionAsync ();
和StopContinuousRecognitionAsync ();
与SDK一起使用,以开始和停止识别。
这里是sample:
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
namespace MicrosoftSpeechSDKSamples
{
public class SpeechRecognitionSamples
{
// Speech recognition from microphone.
public static async Task RecognitionWithMicrophoneAsync()
{
// <recognitionWithMicrophone>
// Creates an instance of a speech factory with specified
// subscription key and service region. Replace with your own subscription key
// and service region (e.g., "westus").
var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");
// Creates a speech recognizer using microphone as audio input. The default language is "en-us".
using (var recognizer = factory.CreateSpeechRecognizer())
{
// Starts recognizing.
Console.WriteLine("Say something...");
// Starts recognition. It returns when the first utterance has been recognized.
var result = await recognizer.RecognizeAsync().ConfigureAwait(false);
// Checks result.
if (result.RecognitionStatus != RecognitionStatus.Recognized)
{
Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
}
else
{
Console.WriteLine($"We recognized: {result.RecognizedText}");
}
}
// </recognitionWithMicrophone>
}
// Speech recognition in the specified spoken language.
public static async Task RecognitionWithLanguageAsync()
{
// <recognitionWithLanguage>
// Creates an instance of a speech factory with specified
// subscription key and service region. Replace with your own subscription key
// and service region (e.g., "westus").
var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");
// Creates a speech recognizer for the specified language, using microphone as audio input.
var lang = "en-us";
using (var recognizer = factory.CreateSpeechRecognizer(lang))
{
// Starts recognizing.
Console.WriteLine($"Say something in {lang} ...");
// Starts recognition. It returns when the first utterance has been recognized.
var result = await recognizer.RecognizeAsync().ConfigureAwait(false);
// Checks result.
if (result.RecognitionStatus != RecognitionStatus.Recognized)
{
Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
}
else
{
Console.WriteLine($"We recognized: {result.RecognizedText}");
}
}
// </recognitionWithLanguage>
}
// Speech recognition from file.
public static async Task RecognitionWithFileAsync()
{
// <recognitionFromFile>
// Creates an instance of a speech factory with specified
// subscription key and service region. Replace with your own subscription key
// and service region (e.g., "westus").
var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");
// Creates a speech recognizer using file as audio input.
// Replace with your own audio file name.
using (var recognizer = factory.CreateSpeechRecognizerWithFileInput(@"YourAudioFile.wav"))
{
// Starts recognition. It returns when the first utterance is recognized.
var result = await recognizer.RecognizeAsync().ConfigureAwait(false);
// Checks result.
if (result.RecognitionStatus != RecognitionStatus.Recognized)
{
Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
}
else
{
Console.WriteLine($"We recognized: {result.RecognizedText}");
}
}
// </recognitionFromFile>
}
// <recognitionCustomized>
// Speech recognition using a customized model.
public static async Task RecognitionUsingCustomizedModelAsync()
{
// Creates an instance of a speech factory with specified
// subscription key and service region. Replace with your own subscription key
// and service region (e.g., "westus").
var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");
// Creates a speech recognizer using microphone as audio input.
using (var recognizer = factory.CreateSpeechRecognizer())
{
// Replace with the CRIS deployment id of your customized model.
recognizer.DeploymentId = "YourDeploymentId";
Console.WriteLine("Say something...");
// Starts recognition. It returns when the first utterance has been recognized.
var result = await recognizer.RecognizeAsync().ConfigureAwait(false);
// Checks results.
if (result.RecognitionStatus != RecognitionStatus.Recognized)
{
Console.WriteLine($"There was an error. Status:{result.RecognitionStatus.ToString()}, Reason:{result.RecognitionFailureReason}");
}
else
{
Console.WriteLine($"We recognized: {result.RecognizedText}");
}
}
}
// </recognitionCustomized>
// <recognitionContinuous>
// Speech recognition with events
public static async Task ContinuousRecognitionAsync()
{
// Creates an instance of a speech factory with specified
// subscription key and service region. Replace with your own subscription key
// and service region (e.g., "westus").
var factory = SpeechFactory.FromSubscription("59a0243e86ae4919aa26f9e839f34b28", "westus");
// Creates a speech recognizer using microphone as audio input.
using (var recognizer = factory.CreateSpeechRecognizer())
{
// Subscribes to events.
recognizer.IntermediateResultReceived += (s, e) => {
Console.WriteLine($"\n Partial result: {e.Result.RecognizedText}.");
};
recognizer.FinalResultReceived += (s, e) => {
if (e.Result.RecognitionStatus == RecognitionStatus.Recognized)
{
Console.WriteLine($"\n Final result: Status: {e.Result.RecognitionStatus.ToString()}, Text: {e.Result.RecognizedText}.");
}
else
{
Console.WriteLine($"\n Final result: Status: {e.Result.RecognitionStatus.ToString()}, FailureReason: {e.Result.RecognitionFailureReason}.");
}
};
recognizer.RecognitionErrorRaised += (s, e) => {
Console.WriteLine($"\n An error occurred. Status: {e.Status.ToString()}, FailureReason: {e.FailureReason}");
};
recognizer.OnSessionEvent += (s, e) => {
Console.WriteLine($"\n Session event. Event: {e.EventType.ToString()}.");
};
// Starts continuos recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
Console.WriteLine("Say something...");
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
Console.WriteLine("Press any key to stop");
Console.ReadKey();
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
}
// </recognitionContinuous>
}
}
如果您有大量音频,最好使用Batch Transcription。
答案 1 :(得分:0)
正如阿里所说,如果您想识别多个话语,则StartContinuousRecognitionAsync()和StopContinousRecognitionAsync()是正确的方法。
{SDK}提供了最新的SpeechSDK示例,https://github.com/Azure-Samples/cognitive-services-speech-sdk包括在不同平台(当前Windows / Linux和更多平台)上用于不同语言(当前为C ++ / C#,如果支持新语言,则将添加示例)。被添加)。
关于问题3),SessionStopped事件用于检测EOF。您可以在这里找到示例:https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/Windows/csharp_samples/speech_recognition_samples.cs#L194。
谢谢