MP3到FLAC for Google的Speech API

时间:2015-12-31 19:54:08

标签: python speech-recognition

我试图找到一种简单的方法将MP3发送给Google进行语音识别。目前,我使用子进程调用SoX,将其转换为WAV。然后,使用SpeechRecognition,它再次将其转换为FLAC。理想情况下,我喜欢更便携(不是特定于操作系统)的方式解码MP3并发送它没有中间文件保存等。

这是我目前所拥有的:

import speech_recognition as sr
import subprocess
import requests

audio = requests.get('http://somesite.com/some.mp3')

with open('/tmp/audio.mp3', 'wb') as file:
    file.write(audio.content)

subprocess.run(['sox', '/tmp/audio.mp3', '/tmp/audio.wav'])

r = sr.Recognizer()
with sr.WavFile('/tmp/audio.wav') as source:
    audio = r.record(source)

result = r.recognize_google(audio)
del r

我已尝试直接使用SpeechRecognition中包含的FLAC二进制文件,但输出只是静态的。我不太热衷于在Git上发布二进制文件,但如果这是唯一的方法我会的。

一些重要的链接:

SR's code for speech recognition

SR's code for WAV to FLAC

修改

我考虑像FLAC二进制文件那样分发SoX,每个操作系统一个,如果SoX的许可证允许的话......

第二个想法,软件许可证令人困惑,我不想搞砸它。

1 个答案:

答案 0 :(得分:0)

我决定这样做:

public static void Main (string[] args)
{
    // Note: This crashes if non numeric characters are entered!
    Console.WriteLine ("Please enter 3 numbers:");
    int num1 = Convert.ToInt32(Console.ReadLine());
    int num2 = Convert.ToInt32(Console.ReadLine());
    int divisor = Convert.ToInt32(Console.ReadLine());

    // Find the lowest and highest in case they are entered in the wrong order
    int lowerNum = Math.Min(num1, num2); 
    int upperNum = Math.Max(num1, num2); 

    // Find the first factor over the lower bound
    // E.g. for a = 10, b = 20, c = 3, we have remainder = 1
    //      = 10 + (3 - 1)
    //      = 12
    int remainder = lowerNum % divisor;
    int factor = (remainder == 0)
      ? lowerNum 
      : lowerNum + (divisor - remainder);

    // Calculate all other factors up to the upper bound by simple addition
    while(factor <= upperNum){
      Console.WriteLine(factor);

      factor += divisor;
    }

}

这更像是一个中间立场,我想从SR模块中借用一些东西。它需要用户安装SoX,但 应该在所有操作系统上工作,并且没有任何中间文件。我只在Linux上测试过它。