使用Java客户端调用Google Cloud Speech时,尽管RecognitionConfig中的采样率正确(如ffmpeg和Sox所示),但我收到INVALID_ARGUMENT:sample_rate_hertz响应。对于文件演讲16k-05sec.wav,Sox显示
Input File : 'lecture-16k-05sec.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:05.00 = 80000 samples ~ 375 CDDA sectors
File Size : 160k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
ffmpeg表演
Input #0, wav, from 'lecture-16k-05sec.wav':
Duration: 00:00:05.00, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
RecognitionConfig的设置如下;字节是字节[]的文件,speechSettings只是设置凭据。
try (SpeechClient speechClient = SpeechClient.create(speechSettings)) {
ByteString audioBytes = ByteString.copyFrom(bytes);
RecognitionConfig config = RecognitionConfig.newBuilder()
.setSampleRateHertz(16000)
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder()
.setContent(audioBytes)
.build();
// Performs speech recognition on the audio file
RecognizeResponse response = speechClient.recognize(config, audio);
List<SpeechRecognitionResult> results = response.getResultsList();
...
}
响应为
INVALID_ARGUMENT: sample_rate_hertz (16000) in RecognitionConfig must either be omitted or match the value in the WAV header ( 1052622831)
Google Java客户端版本为
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-speech</artifactId>
<version>0.56.0-beta</version>
</dependency>
我尝试了许多文件,并尝试将文件编码为FLAC(在RecognitionConfig中进行了适当的更改,但继续出现上述错误。如果我将采样率更改为响应中建议的采样率,则通常再次获得INVALID_ARGUMENT通道数量错误的信息。
任何建议,非常感谢,谢谢。