SFSpeechRecognizer - 检测话语结束

时间:2017-03-01 11:34:12

标签: ios sfspeechrecognizer

我正在使用iOS 10内置语音识别来攻击一个小项目。我使用设备的麦克风获得了工作效果,我的语音被非常准确地识别出来。

我的问题是每个可用的部分转录都会调用识别任务回调,我希望它检测人员停止说话并调用回调,并将isFinal属性设置为true。它没有发生 - 应用程序无限期地倾听。

SFSpeechRecognizer是否能够检测到句尾?

这里是我的代码 - 它基于在互联网上找到的示例,它主要是从麦克风源识别所需的样板。 我通过添加识别taskHint对其进行了修改。我还将shouldReportPartialResults设置为false,但它似乎已被忽略。

    func startRecording() {

    if recognitionTask != nil {
        recognitionTask?.cancel()
        recognitionTask = nil
    }

    let audioSession = AVAudioSession.sharedInstance()
    do {
        try audioSession.setCategory(AVAudioSessionCategoryRecord)
        try audioSession.setMode(AVAudioSessionModeMeasurement)
        try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
    } catch {
        print("audioSession properties weren't set because of an error.")
    }

    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    recognitionRequest?.shouldReportPartialResults = false
    recognitionRequest?.taskHint = .search

    guard let inputNode = audioEngine.inputNode else {
        fatalError("Audio engine has no input node")
    }

    guard let recognitionRequest = recognitionRequest else {
        fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
    }

    recognitionRequest.shouldReportPartialResults = true

    recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in

        var isFinal = false

        if result != nil {
            print("RECOGNIZED \(result?.bestTranscription.formattedString)")
            self.transcriptLabel.text = result?.bestTranscription.formattedString
            isFinal = (result?.isFinal)!
        }

        if error != nil || isFinal {
            self.state = .Idle

            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil

            self.micButton.isEnabled = true

            self.say(text: "OK. Let me see.")
        }
    })

    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
        self.recognitionRequest?.append(buffer)
    }

    audioEngine.prepare()

    do {
        try audioEngine.start()
    } catch {
        print("audioEngine couldn't start because of an error.")
    }

    transcriptLabel.text = "Say something, I'm listening!"

    state = .Listening
}

4 个答案:

答案 0 :(得分:15)

当用户按预期停止讲话时,似乎 isFinal 标志不会变为true。我想这是Apple的通缉行为,因为“用户停止说话”这一事件是一个未定义的事件。

我认为实现目标的最简单方法是执行以下操作:

  • 你必须建立一个“沉默的间隔”。这意味着如果用户的谈话时间不超过你的间隔时间,他就会停止说话(即2秒)。

  • audio session的开头创建计时器

var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)

  • recognitionTask中的新转录失效并重新启动计时器

    timer.invalidate() timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)

  • 如果计时器到期,则意味着用户不会在2秒内通话。您可以安全地停止音频会话并退出

答案 1 :(得分:1)

根据我在iOS10上的测试,当shouldReportPartialResults设置为false时,您必须等待60秒才能获得结果。

答案 2 :(得分:0)

我目前正在使用语音在应用中发短信,这对我来说很好。我的recognitionTask块如下:

recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
        var isFinal = false

        if let result = result, result.isFinal {
            print("Result: \(result.bestTranscription.formattedString)")
            isFinal = result.isFinal
            completion(result.bestTranscription.formattedString, nil)
        }

        if error != nil || isFinal {
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil
            completion(nil, error)
        }
    })

答案 3 :(得分:0)

if result != nil {
    self.timerDidFinishTalk.invalidate()
    self.timerDidFinishTalk = Timer.scheduledTimer(timeInterval: TimeInterval(self.listeningTime), target: self, selector:#selector(self.didFinishTalk), userInfo: nil, repeats: false)

    let bestString = result?.bestTranscription.formattedString

    self.fullsTring =  bestString!.trimmingCharacters(in: .whitespaces)
    self.st = self.fullsTring
  }

这里self.listeningTime是您要在说话结束后停止的时间。