Question

我使用的是阿拉伯语版的Google语音API，它可以很好地转换语音，且之前已达到公认的准确性。但是转录的准确性突然变得完全不准确我不知道怎么了这个问题有帮助吗？

编辑：有我的代码来生成发送到api的请求我正在使用flac音频

try {
        OkHttpClient client = new OkHttpClient.Builder()
                .connectTimeout(60, TimeUnit.SECONDS)
                .writeTimeout(60, TimeUnit.SECONDS)
                .readTimeout(60, TimeUnit.SECONDS)
                .build();
        JSONObject body = new JSONObject();

        JSONObject configData = new JSONObject();
        //config for flac files
        configData.put("encoding", "FLAC");
        configData.put("language_code", "ar-EG");
        configData.put("sample_rate", 16000);
        configData.put("enableAutomaticPunctuation", true);

        JSONObject audioData = new JSONObject();
        audioData.put("content", encodeFileToBase64Binary(filePath));

        body.put("config", configData);
        body.put("audio", audioData);

        RequestBody requestBody = RequestBody.create(JSON, body.toString());
        Request request = new Request.Builder()
                .url("https://speech.googleapis.com/v1/speech:recognize?key=AIzaSyAhYB9C6a8axV7DMYbRluQ3QLa8nXCYL18")
                .post(requestBody)
                .build();

        publishProgress(40);
        Response response = client.newCall(request).execute();
        String result = response.body().string();
        publishProgress(80);
        Log.d("SpeechApiResult", result);
        return result;
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }

private String encodeFileToBase64Binary(String fileName) throws IOException {
    byte[] bytes = FileUtils.readFileToByteArray(new File(fileName));
    byte[] encoded = Base64.encodeBase64(bytes);
    return new String(encoded);
}

Answer 1

通过这段代码，我可以想到两个主要的因果类别，这些类别可能会在使用Cloud Speech-to-Text API时影响转录质量。

代码：
- 您确定代码中的configData对象已正确用于填充the RecognitionConfig object of the client library吗？由于无法了解客户端库的实现，因此无法确定。确保您是importing and using the Google Cloud client library properly by following this guide。
- 我看到您正在使用enableAutomaticPunctuation参数，但是目前使用this feature is only available for us-EN language。如果您不使用这种语言进行抄写，建议您不要使用它。
音频：
- 您确定RecognitionConfig对象的参数准确地描述了样本的音频属性吗？确保following best practices记录，处理并以编程方式正确设置了样本。
- 另一个陷阱是格式/编码混乱。确保您的样本符合supported audio encodings。此外，将最初以有损格式记录的样本转换为无损样本，将不会产生与最初以无损格式记录的样本相同的转录质量。
- 您的样本都具有相同的阿拉伯语言吗？有16 different Arabic languages supported by the Cloud Speech-to-Text API。转录结果的准确性将在很大程度上有所不同，如果录音包含当地方言或语，也可能会受到影响。说话者的发音和环境噪声也是重要因素。

根据这些注意事项，我建议您尝试使用不同的方式来记录和渲染样本，然后使用REST reference page或API explorer中的API测试它们的转录，同时确保{{ 1}}对象已为每种不同的样本类型充分设置。

如果遵循这些建议并不能改善API的结果，请注意，语音到文本API和其他ML解决方案一样，都可以使用预训练的预测模型。尽管这些模型不断改进，但交付的结果仍然是近似的。如果您想帮助Google改进特定语言的API，可以选择加入the Data Logging program。

Google Speech Api的转录不像以前那样准确

1 个答案: