如何从语音识别中提取变量

时间:2018-03-25 18:40:40

标签: c# speech-recognition system.speech.recognition

我使用System.Speech来识别某些短语或单词。其中一个是Set timer。我想将其扩展为Set timer for X seconds,并将代码设置为X秒的计时器。这可能吗?到目前为止,我对此几乎没有任何经验,我只能发现我必须对语法课做一些事情。

现在我已经设置了这样的识别引擎:

SpeechRecognitionEngine = new SpeechRecognitionEngine();
SpeechRecognitionEngine.SetInputToDefaultAudioDevice();

var choices = new Choices();
choices.Add("Set timer");

var gb = new GrammarBuilder();
gb.Append(choices);
var g = new Grammar(gb);

SpeechRecognitionEngine.LoadGrammarAsync(g);

SpeechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
SpeechRecognitionEngine.SpeechRecognized += OnSpeechRecognized;

有办法做到这一点吗?

1 个答案:

答案 0 :(得分:6)

首先,没有内置的数字概念。语音只是单词序列,如果你需要识别数字 - 你需要识别表示数字的单词,例如“一”和“十五”。有些数字由多个单词表示,例如“一百”或“五十一” - 您也需要识别它们。

您可以从识别1到9的数字开始:

var engine = new SpeechRecognitionEngine(CultureInfo.GetCultureInfo("en-US"));
engine.SetInputToDefaultAudioDevice();
var num1To9 = new Choices(
    new SemanticResultValue("one", 1),
    new SemanticResultValue("two", 2),
    new SemanticResultValue("three", 3),
    new SemanticResultValue("four", 4),
    new SemanticResultValue("five", 5),
    new SemanticResultValue("six", 6),
    new SemanticResultValue("seven", 7),
    new SemanticResultValue("eight", 8),
    new SemanticResultValue("nine", 9));

var gb = new GrammarBuilder();
gb.Culture = CultureInfo.GetCultureInfo("en-US");
gb.Append("set timer for");
gb.Append(num1To9);
gb.Append("seconds");
var g = new Grammar(gb);

engine.LoadGrammar(g); // better not use LoadGrammarAsync
engine.SpeechRecognized += OnSpeechRecognized;
engine.RecognizeAsync(RecognizeMode.Multiple);
Console.WriteLine("Speak");
Console.ReadKey();

所以我们的语法可以理解为:

  • “为”短语
  • 设置计时器
  • 后跟“one”或“two”或“three”......
  • 后跟“秒”

我们使用SemanticResultValue为特定词组指定标签。在这种情况下,标签是对应于特定单词(“一”,“两”,“三”)的数字(1,2,3 ......)。通过这样做 - 您可以从识别结果中提取该值:

private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
    var numSeconds = (int)e.Result.Semantics.Value;
    Console.WriteLine($"Starting timer for {numSeconds} seconds...");
}

这已经是一个工作示例,可以识别您的短语,如“设置五秒计时器”,并允许您从中提取语义值(5)。

现在,您可以将各种数字组合在一起,例如:

var num10To19 = new Choices(
    new SemanticResultValue("ten", 10),
    new SemanticResultValue("eleven", 11),
    new SemanticResultValue("twelve", 12),
    new SemanticResultValue("thirteen", 13),
    new SemanticResultValue("fourteen", 14),
    new SemanticResultValue("fifteen", 15),
    new SemanticResultValue("sexteen", 16),
    new SemanticResultValue("seventeen", 17),
    new SemanticResultValue("eighteen", 18),
    new SemanticResultValue("nineteen", 19)
);

var numTensFrom20To90 = new Choices(
    new SemanticResultValue("twenty", 20),
    new SemanticResultValue("thirty", 30),
    new SemanticResultValue("forty", 40),
    new SemanticResultValue("fifty", 50),
    new SemanticResultValue("sixty", 60),
    new SemanticResultValue("seventy", 70),
    new SemanticResultValue("eighty", 80),
    new SemanticResultValue("ninety", 90)
);

var num20to99 = new GrammarBuilder();
// first word is "twenty", "thirty" etc
num20to99.Append(numTensFrom20To90);
// followed by ONE OR ZERO "digit" words ("one", "two", "three" etc)
num20to99.Append(num1To9, 0, 1);

但是正确地为它们分配语义值会变得棘手,因为这个带有GrammarBuilder的api不够强大。

当您想要做的事情不能(轻松)完成纯GrammarBuilder和相关类时 - 您必须使用更强大的xml文件,其语法在this规范中定义。

这些语法文件的描述超出了这个问题的范围,但幸运的是,对于您的任务,Microsoft Speech SDK中已经提供了语法文件,您可能已经下载并安装了该语法文件。因此,从“C:\ Program Files \ Microsoft SDKs \ Speech \ v11.0 \ Samples \ Sample Grammars \ en-US.grxml”(或安装SDK的任何地方)复制文件并删除一些不相关的内容,例如first {{ 1}}内部有大CDATA的元素。

此文件中的兴趣归属被命名为“Cardinal”,并允许识别0到100万的数字。然后我们的代码变成:

<tag>

处理程序变为:

var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");
sampleDoc.Culture = CultureInfo.GetCultureInfo("en-US");
// define new rule, named Timer
SrgsRule rootRule = new SrgsRule("Timer");            
// match "set timer for" phrase
rootRule.Add(new SrgsItem("set timer for"));
// followed by whatever "Cardinal" rule defines (reference to another rule)
rootRule.Add(new SrgsRuleRef(sampleDoc.Rules["Cardinal"]));
// followed by "seconds"
rootRule.Add(new SrgsItem("seconds"));
// add to rules
sampleDoc.Rules.Add(rootRule);
// make it a root rule, so that it will be used for recognition
sampleDoc.Root = rootRule;
var g = new Grammar(sampleDoc);

engine.LoadGrammar(g); // better not use LoadGrammarAsync
engine.SpeechRecognized += OnSpeechRecognized;
engine.RecognizeAsync(RecognizeMode.Multiple);

现在,您可以识别最多100万的数字。

当然没有必要像上面那样在代码中定义规则 - 您可以在xml中完全定义所有规则,然后只需将其加载为private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) { var numSeconds = Convert.ToInt32(e.Result.Semantics.Value); Console.WriteLine($"Starting timer for {numSeconds} seconds..."); } 并从中创建SrgsDocument

如果你想识别多个命令 - 这是一个示例:

Grammar

处理程序变为:

var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");            
sampleDoc.Culture = CultureInfo.GetCultureInfo("en-US");
// this rule is the same as above
var setTimerRule = new SrgsRule("SetTimer");            
setTimerRule.Add(new SrgsItem("set timer for"));            
setTimerRule.Add(new SrgsRuleRef(sampleDoc.Rules["Cardinal"]));            
setTimerRule.Add(new SrgsItem("seconds"));            
sampleDoc.Rules.Add(setTimerRule);

// new rule, clear timer
var clearTimerRule = new SrgsRule("ClearTimer");
// just match this phrase
clearTimerRule.Add(new SrgsItem("clear timer"));
sampleDoc.Rules.Add(clearTimerRule);
// new root rule, marching either set timer OR clear timer
var rootRule = new SrgsRule("Times");
rootRule.Add(new SrgsOneOf( // << OneOf is basically the same as Choice
    //               reference to SetTimer                                         
    new SrgsItem(new SrgsRuleRef(setTimerRule), 
        // assign command name. Both "command" and "settimer" are arbitrary names I chose
        new SrgsSemanticInterpretationTag("out = rules.latest();out.command = 'settimer';")),
    new SrgsItem(new SrgsRuleRef(clearTimerRule),
        // assign command name. If this rule "wins" - command will be cleartimer
        new SrgsSemanticInterpretationTag("out.command = 'cleartimer';"))
));
sampleDoc.Rules.Add(rootRule);
sampleDoc.Root = rootRule;
var g = new Grammar(sampleDoc);

对于完整版 - 以下是如何使用纯xml执行相同操作的方法。使用xml编辑器打开“en-US-sample.grxml”文件,并在代码中添加我们在上面定义的规则。它们看起来像这样:

private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
    var sem = e.Result.Semantics;
    // here "command" is arbitrary key we assigned in our rule
    var commandName = (string) sem["command"].Value;
    switch (commandName) {
        // also arbitrary values we assigned, not related to rule names or something else
        case "settimer":
            var numSeconds = Convert.ToInt32(sem.Value);
            Console.WriteLine($"Starting timer for {numSeconds} seconds...");
            break;
        case "cleartimer":
            Console.WriteLine("timer cleared");
            break;
    }
}

现在在根语法标记处设置根规则:

<rule id="SetTimer" scope="private">
    <item>set timer for</item>
    <item>
        <ruleref uri="#Cardinal" />
    </item>
    <item>seconds</item>
</rule>

<rule id="ClearTimer" scope="private">
    <item>clear timer</item>
</rule>

<rule id="Timers" scope="public">
    <one-of>
        <item>
            <ruleref uri="#SetTimer" />
            <tag>out = rules.latest(); out.command = 'settimer'</tag>
        </item>
        <item>
            <ruleref uri="#ClearTimer" />
            <tag>out.command = 'cleartimer'</tag>
        </item>
    </one-of>
</rule> 

并保存。

现在我们不需要在代码中定义任何内容,我们需要做的就是加载我们的语法文件:

<grammar xml:lang="en-US" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0" 
    root="Timers">

这就是全部。因为“定时器”规则是语法文件中的根规则 - 它将用于识别,并且行为与我们在代码中定义的版本完全相同。