Microsoft语音识别:具有置信度得分的替代结果?

时间:2013-09-23 17:26:03

标签: .net speech-recognition speech microsoft-speech-platform microsoft-speech-api

我刚接触使用Microsoft.Speech识别器(使用Microsoft Speech Platform SDK Version 11),我试图让它从简单的语法中输出n次最佳识别匹配,以及每。

根据文档(以及提及in the answer to this question),应该能够使用e.Result.Alternates来访问除得分最高的单词之外的已识别单词。然而,即使在将置信度拒绝阈值重置为0(这应该意味着没有被拒绝)之后,我仍然只得到一个结果,而没有替代(尽管SpeechHypothesized事件表明至少有一个其他单词似乎在某些时候被认为具有非零置信度。)

我的问题:任何人都可以向我解释为什么我只得到一个被识别的单词,即使置信拒绝阈值设置为零?我怎样才能获得其他可能的比赛和他们的信心分数?我在这里缺少什么?

以下是我的代码。提前感谢任何可以提供帮助的人:)


在下面的示例中,识别器被发送一个单词“news”的wav文件,并且必须从相似的单词(“noose”,“newts”)中进行选择。我想提取一个识别器的每个单词的置信度分数列表(它们都应该是非零的),即使它只返回最好的一个(“新闻”)作为结果。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Speech.Recognition;

namespace SimpleRecognizer
{
    class Program
    {
        static readonly string[] settings = new string[] {
            "CFGConfidenceRejectionThreshold",
            "HighConfidenceThreshold", 
            "NormalConfidenceThreshold",
            "LowConfidenceThreshold"};

        static void Main(string[] args)
        {
            // Create a new SpeechRecognitionEngine instance.
            SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); //en-US SRE

            // Configure the input to the recognizer.
            sre.SetInputToWaveFile(@"C:\Users\Anjana\Documents\news.wav");

            // Display Recognizer Settings (Confidence Thresholds)
            ListSettings(sre);

            // Set Confidence Threshold to Zero (nothing should be rejected)
            sre.UpdateRecognizerSetting("CFGConfidenceRejectionThreshold", 0);
            sre.UpdateRecognizerSetting("HighConfidenceThreshold", 0);
            sre.UpdateRecognizerSetting("NormalConfidenceThreshold", 0);
            sre.UpdateRecognizerSetting("LowConfidenceThreshold", 0);

            // Display New Recognizer Settings
            ListSettings(sre);

            // Build a simple Grammar with three choices
            Choices topics = new Choices();
            topics.Add(new string[] { "news", "newts", "noose" });
            GrammarBuilder gb = new GrammarBuilder();
            gb.Append(topics);
            Grammar g = new Grammar(gb);
            g.Name = "g";

            // Load the Grammar
            sre.LoadGrammar(g);

            // Register handlers for Grammar's SpeechRecognized Events
            g.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(gram_SpeechRecognized);

            // Register a handler for the recognizer's SpeechRecognized event.
            sre.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);

            // Register Handler for SpeechHypothesized
            sre.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized);

            // Start recognition.
            sre.Recognize();

            Console.ReadKey(); //wait to close

        }
        static void gram_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("\nNumber of Alternates from Grammar {1}: {0}", e.Result.Alternates.Count.ToString(), e.Result.Grammar.Name);
            foreach (RecognizedPhrase phrase in e.Result.Alternates)
            {
                Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
            }
        }
        static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            Console.WriteLine("\nSpeech recognized: " + e.Result.Text + ", " + e.Result.Confidence);
            Console.WriteLine("Number of Alternates from Recognizer: {0}", e.Result.Alternates.Count.ToString());
            foreach (RecognizedPhrase phrase in e.Result.Alternates)
            {
                Console.WriteLine(phrase.Text + ", " + phrase.Confidence);
            }
        }
        static void sre_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
        {
            Console.WriteLine("Speech from grammar {0} hypothesized: {1}, {2}", e.Result.Grammar.Name, e.Result.Text, e.Result.Confidence);
        }
        private static void ListSettings(SpeechRecognitionEngine recognizer)
        {
            foreach (string setting in settings)
            {
                try
                {
                    object value = recognizer.QueryRecognizerSetting(setting);
                    Console.WriteLine("  {0,-30} = {1}", setting, value);
                }
                catch
                {
                    Console.WriteLine("  {0,-30} is not supported by this recognizer.",
                      setting);
                }
            }
            Console.WriteLine();
        }
    }
}

这给出了以下输出:

Original recognizer settings:
  CFGConfidenceRejectionThreshold = 20
  HighConfidenceThreshold        = 80
  NormalConfidenceThreshold      = 50
  LowConfidenceThreshold         = 20

Updated recognizer settings:
  CFGConfidenceRejectionThreshold = 0
  HighConfidenceThreshold        = 0
  NormalConfidenceThreshold      = 0
  LowConfidenceThreshold         = 0

Speech from grammar g hypothesized: noose, 0.2214646
Speech from grammar g hypothesized: news, 0.640804

Number of Alternates from Grammar g: 1
news, 0.9208503

Speech recognized: news, 0.9208503
Number of Alternates from Recognizer: 1
news, 0.9208503

我也尝试用每个单词的单独短语(而不是一个有三个选项的短语)来实现它,甚至每个单词/短语都有一个单独的语法。结果基本相同:只有一个“备用”。

1 个答案:

答案 0 :(得分:1)

我相信这是另一个让SAPI可以让你询问SR引擎并不真正支持的东西的地方。

Microsoft.Speech.Recognition和System.Speech.Recognition都使用底层的SAPI接口来完成他们的工作;唯一的区别是使用哪个SR引擎。 (Microsoft.Speech.Recognition使用Server引擎; System.Speech.Recognition使用Desktop引擎。)

Alternates主要用于听写,而不是无上下文语法。您总是可以为CFG获得一个备用,但替代生成代码看起来不会扩展CFG的备用。

不幸的是,Microsoft.Speech.Recognition引擎不支持听写。 (但是,它的音质质量要低得多,而且不需要培训。)