我正在使用iOS 10内置语音识别来攻击一个小项目。我使用设备的麦克风获得了工作效果,我的语音被非常准确地识别出来。
我的问题是每个可用的部分转录都会调用识别任务回调,我希望它检测人员停止说话并调用回调,并将isFinal
属性设置为true。它没有发生 - 应用程序无限期地倾听。
SFSpeechRecognizer
是否能够检测到句尾?
这里是我的代码 - 它基于在互联网上找到的示例,它主要是从麦克风源识别所需的样板。
我通过添加识别taskHint
对其进行了修改。我还将shouldReportPartialResults
设置为false,但它似乎已被忽略。
func startRecording() {
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
recognitionRequest?.shouldReportPartialResults = false
recognitionRequest?.taskHint = .search
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
print("RECOGNIZED \(result?.bestTranscription.formattedString)")
self.transcriptLabel.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.state = .Idle
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.micButton.isEnabled = true
self.say(text: "OK. Let me see.")
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
transcriptLabel.text = "Say something, I'm listening!"
state = .Listening
}
答案 0 :(得分:15)
当用户按预期停止讲话时,似乎 isFinal 标志不会变为true。我想这是Apple的通缉行为,因为“用户停止说话”这一事件是一个未定义的事件。
我认为实现目标的最简单方法是执行以下操作:
你必须建立一个“沉默的间隔”。这意味着如果用户的谈话时间不超过你的间隔时间,他就会停止说话(即2秒)。
在audio session
的开头创建计时器:
var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
当recognitionTask
中的新转录失效并重新启动计时器
timer.invalidate()
timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
如果计时器到期,则意味着用户不会在2秒内通话。您可以安全地停止音频会话并退出
答案 1 :(得分:1)
根据我在iOS10上的测试,当shouldReportPartialResults设置为false时,您必须等待60秒才能获得结果。
答案 2 :(得分:0)
我目前正在使用语音在应用中发短信,这对我来说很好。我的recognitionTask块如下:
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if let result = result, result.isFinal {
print("Result: \(result.bestTranscription.formattedString)")
isFinal = result.isFinal
completion(result.bestTranscription.formattedString, nil)
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
completion(nil, error)
}
})
答案 3 :(得分:0)
if result != nil {
self.timerDidFinishTalk.invalidate()
self.timerDidFinishTalk = Timer.scheduledTimer(timeInterval: TimeInterval(self.listeningTime), target: self, selector:#selector(self.didFinishTalk), userInfo: nil, repeats: false)
let bestString = result?.bestTranscription.formattedString
self.fullsTring = bestString!.trimmingCharacters(in: .whitespaces)
self.st = self.fullsTring
}
这里self.listeningTime
是您要在说话结束后停止的时间。