Question

我目前正在尝试创建一个使用Google Cloud语音转文本的网络应用，尤其是说话者区分功能。我的服务器是用节点js编写的，我正在将音频文件作为Google存储URI发送。我的语音配置看起来像这样

config: {
          encoding: 'LINEAR16',
          languageCode: 'en-GB',
          sampleRateHertz: 8000,
          enableSpeakerDiarization: true,
          diarizationSpeakerCount: true,
        }

我返回的成绩单有一个空的'words'数组，谷歌云语音文档告诉我该数组应包含发言人标签：

{ words: [],
transcript: 'and the rabbit sails at dusk',
confidence: 0.8659023642539978 }

值得注意的是，如果我添加

enableWordTimeOffsets: true,

到我的配置，然后我得到一个'words'数组，像这样：

[ { startTime: { seconds: '0', nanos: 0 },
endTime: { seconds: '0', nanos: 600000000 },
word: 'Hello' } etc..

Answer 1

Config应该是这样的

const config = {
        encoding: 'LINEAR16',
        sampleRateHertz: 8000,
        languageCode: 'en-GB'
        enableAutomaticPunctuation: true,
        useEnhanced: true,
        model: 'video',
        diarizationConfig : {
          enableSpeakerDiarization: true,
          minSpeakerCount: 2,
          maxSpeakerCount: 3,
      }
    }

有关RecognitionConfig的更多信息，请访问

https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig

如何在Node JS的Google Cloud Speech库中启用说话者区分？

1 个答案: