Question

我正在尝试利用Google Speech-To-Text python客户端库。我的请求很好，但api为空。我正在使用从客户端发送的音频二进制数据。记录麦克风输入3秒钟，然后通过ajx请求发送。

我尝试过更改编码，将其更改为base64，但似乎没有任何方法可以提供成功的响应。

这是我的python代码

from flask import Flask, request, render_template
import io
import os
import sys
import json
import base64

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

app = Flask(__name__)

@app.route('/audio', methods=['PUT'])
def audio():
    client = speech.SpeechClient()
    content = request.files['audio'].read()
    # with open('voice.wav', 'wb') as file:
    #   file.write(content)
    # with open('voice.wav', 'rb') as file:
    #   content = file.read();
    audio = types.RecognitionAudio(content=base64.b64encode(content))
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=48000,
        language_code='en-US')

    # Detects speech in the audio file
    response = client.recognize(config, audio)
    print(response, file=sys.stderr)

    for result in response.results:
        print('Transcript: {}'.format(result.alternatives[0].transcript), file=sys.stderr)

    return json.dumps({'success':True}), 200, {'ContentType':'application/json'}

还有我的js代码

const recordAudio = () =>
new Promise(async resolve => {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const mediaRecorder = new MediaRecorder(stream);
  const audioChunks = [];

  mediaRecorder.addEventListener("dataavailable", event => {
    audioChunks.push(event.data);
  });

  const start = () => mediaRecorder.start();

  const stop = () =>
  new Promise(resolve => {
    mediaRecorder.addEventListener("stop", () => {
      const audioBlob = new Blob(audioChunks);
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);
      const play = () => audio.play();
      resolve({ audioBlob, audioUrl, play });
    });

    mediaRecorder.stop();
  });

  resolve({ start, stop });
});

const sendAudio = (audioBlob) =>
new Promise(async resolve => {
  var formData = new FormData();
  formData.append('audio', audioBlob, 'audio')
  $.ajax({
    type: 'PUT',
    url: '/audio',
    data: formData,
    processData: false,
    contentType: false
  }).done(function(data) {
   console.log(data);
 });
})

const sleep = time => new Promise(resolve => setTimeout(resolve, time));

const handleAction = async () => {
  const recorder = await recordAudio();
  const actionButton = document.getElementById('action');
  actionButton.disabled = true;
  recorder.start();
  await sleep(3000);
  const audio = await recorder.stop();
  audio.play();
  await sendAudio(audio.audioBlob)
  await sleep(3000);
  actionButton

和HTML

<!doctype html>
<html>
  <head>
    <title>Record Audio Test</title>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
  </head>
  <body>
    <h1>Audio Recording Test</h1>
    <p>Talk for 3 seconds, then you will hear your recording played back</p>
    <script src="/static/index.js"></script>
    <button id="action" onclick="handleAction()">Start recording...</button>
  </body>
</html>

Answer 1

当语音转文本返回空响应时，可能是音频未使用正确的编码。确保数据的音频编码（例如“ sample_rate_hertz”）与您在InitialRecognizeRequest中发送的参数匹配。

例如，如果您的请求指定了“ encoding”：“ FLAC”和“ sampleRateHertz”：16000，则SoX play命令列出的音频数据参数应该相同。

有关此here的更多信息。

使用python客户端库时，Google Speech-To-Text返回空响应

1 个答案: