Question

在尝试通过AVSpeechUtterance向用户发出欢迎消息后，尝试将SFSpeechRecognizer用于语音文本。但是随机地，语音识别没有开始（在说出欢迎信息之后），它会抛出下面的错误信息。

[avas]错误：AVAudioSession.mm:1049： - [AVAudioSession setActive：withOptions：error：]：停用已运行I / O的音频会话。在停用音频会话之前，应停止或暂停所有I / O.

它可以工作几次。我不清楚为什么它不能始终如一地运作。

我尝试了其他SO帖子中提到的解决方案，其中提到检查是否有音频播放器正在运行。我补充说，检查语音部分代码的语音。它返回false（即没有其他音频播放器正在运行）但是语音到文本仍然没有开始监听用户语音。你能指导一下我出错了吗？

正在测试运行iOS 10.3的iPhone 6

以下是使用的代码段：

文字转语音：

- (void) speak:(NSString *) textToSpeak {
    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayback
      withOptions:AVAudioSessionCategoryOptionDuckOthers error:nil];

    [synthesizer stopSpeakingAtBoundary:AVSpeechBoundaryImmediate];

    AVSpeechUtterance* utterance = [[AVSpeechUtterance new] initWithString:textToSpeak];
    utterance.voice = [AVSpeechSynthesisVoice voiceWithLanguage:locale];
    utterance.rate = (AVSpeechUtteranceMinimumSpeechRate * 1.5 + AVSpeechUtteranceDefaultSpeechRate) / 2.5 * rate * rate;
    utterance.pitchMultiplier = 1.2;
    [synthesizer speakUtterance:utterance];
}

- (void)speechSynthesizer:(AVSpeechSynthesizer*)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance*)utterance {
    //Return success message back to caller

    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryAmbient
      withOptions: 0 error: nil];
    [[AVAudioSession sharedInstance] setActive:YES withOptions: 0 error:nil];
}

演讲文字：

- (void) recordUserSpeech:(NSString *) lang {
    NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:lang];
    self.sfSpeechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
    [self.sfSpeechRecognizer setDelegate:self];

    NSLog(@"Step1: ");
    // Cancel the previous task if it's running.
    if ( self.recognitionTask ) {
        NSLog(@"Step2: ");
        [self.recognitionTask cancel];
        self.recognitionTask = nil;
    }

    NSLog(@"Step3: ");
    [self initAudioSession];

    self.recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    NSLog(@"Step4: ");

    if (!self.audioEngine.inputNode) {
        NSLog(@"Audio engine has no input node");
    }

    if (!self.recognitionRequest) {
        NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
    }

    self.recognitionTask = [self.sfSpeechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult *result, NSError *error) {

        bool isFinal= false;

        if (error) {
            [self stopAndRelease];
            NSLog(@"In recognitionTaskWithRequest.. Error code ::: %ld, %@", (long)error.code, error.description);
            [self sendErrorWithMessage:error.localizedFailureReason andCode:error.code];
        }

        if (result) {

            [self sendResults:result.bestTranscription.formattedString];
            isFinal = result.isFinal;
        }

        if (isFinal) {
            NSLog(@"result.isFinal: ");
            [self stopAndRelease];
            //return control to caller
        }
    }];

    NSLog(@"Step5: ");

    AVAudioFormat *recordingFormat = [self.audioEngine.inputNode outputFormatForBus:0];

    [self.audioEngine.inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
        //NSLog(@"Installing Audio engine: ");
        [self.recognitionRequest appendAudioPCMBuffer:buffer];
    }];

    NSLog(@"Step6: ");

    [self.audioEngine prepare];
    NSLog(@"Step7: ");
    NSError *err;
    [self.audioEngine startAndReturnError:&err];
}
- (void) initAudioSession
{
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory:AVAudioSessionCategoryRecord error:nil];
    [audioSession setMode:AVAudioSessionModeMeasurement error:nil];
    [audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:nil];
}

-(void) stopAndRelease
{
    NSLog(@"Invoking SFSpeechRecognizer stopAndRelease: ");
    [self.audioEngine stop];
    [self.recognitionRequest endAudio];
    [self.audioEngine.inputNode removeTapOnBus:0];
    self.recognitionRequest = nil;
    [self.recognitionTask cancel];
    self.recognitionTask = nil;
}

关于添加的日志，我能够看到所有日志，直到＆＃34; Step7＆＃34;打印。

在调试设备中的代码时，它始终在下面的行中触发中断（我设置了异常断点），但是继续继续执行。然而，在几次成功执行期间，情况也是如此。

AVAudioFormat * recordingFormat = [self.audioEngine.inputNode outputFormatForBus：0];

[self.audioEngine prepare];

Answer 1

原因是当调用-speechSynthesizer:didFinishSpeechUtterance:时音频没有完全完成，因此您尝试调用setActive:NO时会出现此类错误。您无法在I / O运行期间停用AudioSession或更改任何设置。解决方法：等待几毫秒（下面阅读多长时间），然后执行AudioSession停用和填充。

关于音频播放完成的几句话。

乍一看，这看起来很奇怪，但我花了很多时间来研究这个问题。当您将最后一个声音块放入设备输出时，您只有实际完成时的大致时间。查看AudioSession属性ioBufferDuration：

音频I / O缓冲区持续时间是单个的秒数   音频输入/输出周期。例如，I / O缓冲持续时间为   每个音频I / O周期0.005 s：


如果获得输入，您将收到0.005 s的音频。

如果提供输出，则必须提供0.005秒的音频。


典型的最大I / O缓冲持续时间为0.93 s（对应于4096个样本   帧采样率为44.1 kHz）。最小I / O缓冲持续时间   至少0.005秒（256帧）但可能会更低，具体取决于   正在使用的硬件。

因此，我们可以将此值解释为一个块播放时间。但是，在此时间线和实际音频播放完成（硬件延迟）之间，您仍然有一个很小的非计算持续时间。我会说你需要等ioBufferDuration * 1000 + delay ms来确保音频播放完整（ioBufferDuration * 1000 - 因为它是秒的持续时间），其中delay是相当的价值不大。

似乎甚至苹果开发人员也不太确定音频完成时间。快速查看新音频课AVAudioPlayerNode和func scheduleBuffer(_ buffer: AVAudioPCMBuffer, completionHandler: AVFoundation.AVAudioNodeCompletionHandler? = nil)：

@param completionHandler在缓冲区被消耗之后被调用   玩家或玩家停止了。可能是零。

@discussion安排缓冲区按照以前安排的任何命令播放。可以调用completionHandler   在渲染开始之前或完全播放缓冲区之前。

您可以在Understanding the Audio Unit Render Callback Function中了解有关音频处理的更多信息（AudioUnit是提供对I / O数据的紧密访问的低级API。）

在AVSpeechUtterance之后使用SFSpeechRecognizer时出现AVAudioSession问题

1 个答案:

关于音频播放完成的几句话。