语音识别器打破了语音合成器

时间:2017-05-02 16:24:25

标签: ios swift3 text-to-speech speech-to-text

我必须忽略某些东西,但是当我尝试在Swift中组合语音合成和语音识别时,我得到了错误的结果(“无法获取属性'LocalURL':错误域= MobileAssetError代码= 1”无法复制资产属性“UserInfo = {NSDescription =无法复制资产属性}”,最终的结果是,之后我能够对文本进行语音处理,但文本到语音会被破坏,直到应用程序重新启动。

let identifier = "\(Locale.current.languageCode!)_\(Locale.current.regionCode!)" // e.g. en-US
speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: identifier))!

if audioEngine.isRunning {
    audioEngine.stop() // will also stop playing music.
    recognitionRequest?.endAudio()
    speechButton.isEnabled = false
} else {
    recordSpeech() // here we do steps 1 .. 12
}

// recordSpeech() :

if recognitionTask != nil {  // Step 1
    recognitionTask?.cancel()
    recognitionTask = nil
}

let audioSession = AVAudioSession.sharedInstance()  // Step 2
do {
    try audioSession.setCategory(AVAudioSessionCategoryRecord)
    try audioSession.setMode(AVAudioSessionModeMeasurement)
    try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
    print("audioSession properties weren't set because of an error.")
}

recognitionRequest = SFSpeechAudioBufferRecognitionRequest()  // Step 3

guard let inputNode = audioEngine.inputNode else {
    fatalError("Audio engine has no input node")
}  // Step 4

guard let recognitionRequest = recognitionRequest else {
    fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
} // Step 5

recognitionRequest.shouldReportPartialResults = true  // Step 6

recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in  // Step 7

    var isFinal = false  // Step 8

    if result != nil {

        print(result?.bestTranscription.formattedString as Any)

        isFinal = (result?.isFinal)!
        if (isFinal) {
            if (result != nil) {
                self.speechOutput.text = self.speechOutput.text + "\n" + (result?.bestTranscription.formattedString)!
            }
        }
    }

    if error != nil || isFinal {  // Step 10
        self.audioEngine.stop()
        inputNode.removeTap(onBus: 0)

        self.recognitionRequest = nil
        self.recognitionTask = nil
        self.speechButton.isEnabled = true

    }
})

let recordingFormat = inputNode.outputFormat(forBus: 0)  // Step 11
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
    self.recognitionRequest?.append(buffer)
}

audioEngine.prepare()  // Step 12

do {
    try audioEngine.start()
} catch {
    print("audioEngine couldn't start because of an error.")
}

我使用了这样的教程来建立我的代码:

http://www.appcoda.com/siri-speech-framework/

func say(_ something : String, lang : String ) {


    let synth = AVSpeechSynthesizer()
    synth.delegate = self


    print(something) // debug code, works fine
    let identifier = "\(Locale.current.languageCode!)-\(Locale.current.regionCode!)"
    let utterance = AVSpeechUtterance(string: something)
    utterance.voice = AVSpeechSynthesisVoice(language: identifier)

    synth.speak(utterance)
}

因此,如果我自己使用“说”方法,那么效果很好,如果我将两者合并,在进行语音识别后,合成器将不再起作用。任何提示解决方案的提示?我想某些东西没有被优雅地恢复到它的先前状态,但我似乎无法弄清楚是什么。

1 个答案:

答案 0 :(得分:0)

Grrr ......

这是解决方案,抱歉看起来不够好,但我花了很多时间。

func say(_ something : String, lang : String ) {

    let audioSession = AVAudioSession.sharedInstance()
    do {
        // this is the solution:
        try audioSession.setCategory(AVAudioSessionCategoryPlayback)
        try audioSession.setMode(AVAudioSessionModeDefault)
        // the recognizer uses AVAudioSessionCategoryRecord
        // so we want to set it to AVAudioSessionCategoryPlayback
        // again before we can say something
    } catch {
        print("audioSession properties weren't set because of an error.")
    }

    synth = AVSpeechSynthesizer()

    synth.delegate = self


    print(something)
    let identifier = "\(Locale.current.languageCode!)-\(Locale.current.regionCode!)"
    let utterance = AVSpeechUtterance(string: something)
    utterance.voice = AVSpeechSynthesisVoice(language: identifier)

    synth.speak(utterance)

}