迅捷:如何从语音识别任务中保存音频文件

时间:2020-08-24 22:07:56

标签: swift amazon-s3 speech-recognition

我正在构建和利用语音识别将语音转换为文本的应用程序。 一切正常,但我想保存录音,以便可以将所说的内容与转录的内容进行比较。这是我的录音代码

func tapUploadVideo(_ sender: Any) {

//guard let path = Bundle.main.path(forResource: "Video", ofType: "mov") else { return }
let videoUrl = URL(fileURLWithPath: "your video file path")
AWSS3Manager.shared.uploadVideo(videoUrl: videoUrl, progress: { [weak self] (progress) in
    
    guard let strongSelf = self else { return }
    strongSelf.progressView.progress = Float(progress)
    
}) { [weak self] (uploadedFileUrl, error) in
    
    guard let strongSelf = self else { return }
    if let finalPath = uploadedFileUrl as? String {
        strongSelf.s3UrlLabel.text = "Uploaded file url: " + finalPath
    } else {
        print("\(String(describing: error?.localizedDescription))")
    }
  }
}

我正在尝试使用此功能将捕获的音频上传到AWS s3

Credential validation was not successful: Timed out connecting to server

如何获取捕获的音频本地URL?

1 个答案:

答案 0 :(得分:0)

当前,您正在使用利用流音频的API。这样做的好处是,从理论上讲,您可以永久流式传输,而无需占用任何磁盘空间(除了缓存和其他功能以外)。缺点是没有录音。

有两种方法可以解决此问题。

1。)首先记录音频,将其保存到磁盘,然后将其传递给Apple提供的其他API。 SFSpeechURLRecognitionRequest,而不是SFSpeechAudioBufferRecognitionRequest。您会找到一种录制音频的方法,然后将URL传递到您录制的文件中。

2。)这可能是您正在寻找的那个。在self.recognitionRequest回调中建立installTap时,可以使用AVAudioFile将该缓冲区保存到磁盘。这是一个似乎可行的示例:)

保存文件后,您可以将现有代码与您指定给文件的URL路径一起使用,以将其上传到所需位置。祝你好运!

private func startRecording() throws {
    // Cancel the previous task if it's running.
    recognitionTask?.cancel()
    self.recognitionTask = nil
    
    // Configure the audio session for the app.
    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
    try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
    let inputNode = audioEngine.inputNode

    // Create and configure the speech recognition request.
    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
    recognitionRequest.shouldReportPartialResults = true
    
    // Keep speech recognition data on device
    if #available(iOS 13, *) {
        recognitionRequest.requiresOnDeviceRecognition = false
    }
    
    // Create a recognition task for the speech recognition session.
    // Keep a reference to the task so that it can be canceled.
    recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
        var isFinal = false
        
        if let result = result {
            // Update the text view with the results.
            self.textView.text = result.bestTranscription.formattedString
            isFinal = result.isFinal
            print("Text \(result.bestTranscription.formattedString)")
        }
        
        if error != nil || isFinal {
            // Stop recognizing speech if there is a problem.
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil

            self.recordButton.isEnabled = true
            self.recordButton.setTitle("Start Recording", for: [])
        }
    }

    guard let commonFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 44100, channels: 2, interleaved: false) else {
        return
    }

    //Setup a file to record to
    let paths = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)
    let recordingPath = paths[0].appendingPathComponent("recording.wav")
    let audioFile = try! AVAudioFile(forWriting: recordingPath, settings: commonFormat.settings, commonFormat: commonFormat.commonFormat, interleaved: false)

    // Configure the microphone input.
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: commonFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
        self.recognitionRequest?.append(buffer)

        do {
            try audioFile.write(from: buffer)
        } catch {
            print(error.localizedDescription)
        }

    }
    
    audioEngine.prepare()
    try audioEngine.start()
    
    // Let the user know to start talking.
    textView.text = "(Go ahead, I'm listening)"
}